In other respects computer simulation complements conventional biology.
Biologists traditionally use the reductionist approach to natural
systems, breaking them down into successively smaller and smaller parts. By
analyzing subsystems, biologists deduce basic mechanisms and make inferences
about general principles underlying the behavior of life on Earth. In contrast,
simulation allows a researcher to reverse the trend and use a synthetic
approach, building life-like systems from the bottom-up rather than top-down
[Langton 89].
Thesis
goal
This thesis describes the design and implementation of a C++ class library[1] called SIMBIOSYS. The goal of developing
this system is to facilitate the construction of simulations that capture the
complex dynamic and adaptive behaviors exhibited by natural living systems. The
reason for creating these simulations is two-fold:
Though still nascent, the field of artificial life has begun to provide the means for better understanding emergent properties. [...] Reductionist science, the method which characterizes most serious academic research today, is largely analytic, breaking things down, and so misses these emergent properties. Reductionist science has been enormously successful in many respects, but many features are being ignored. This occurs not because those features are uninteresting or unimportant. To the contrary, one would love to study them, but the appropriate tools and effective methods for researching them have been lacking. [Taylor 92]
Heinz Pagels writes in his Dreams of Reason:
Possession of a program with unique analytic capabilities puts a scientist in as much of a privileged position to make new discoveries as the possession of a powerful telescope. These cognitive instruments are in the realm of concepts and information; nonetheless they are used by scientists as tools, extensions of our ordinary human analytic capability. [Pagels 89]
At the time of this writing there are no evolutionary simulation libraries in the public domain; clearly there is a need for better simulation tools. Though a system spanning all of biology from biochemistry to macroevolution would be ideal, the scope of SIMBIOSYS has been restricted to the latter end of the spectrum, from ethology to macroevolution, to make the project tractable. The class library lends itself well to problems in evolutionary biology, particularly those dealing with evolutionarily stable strategies, i. e., strategies that do well in populations dominated by the same strategy [Maynard Smith 74]. Examples of these types of problems include the following:
Sexual selection and female choice: the paradox of the evolution of female preferences for otherwise maladaptive traits in males is an old problem [Darwin 71]. The classic example from nature is the peacock. Male peacocks grow enormous, brightly coloured tail feathers which makes them easily spotted by predators and hampers movement. This liability is offset by females preferring to mate with males having long tails.
Sexual vs. asexual reproduction: the origin, maintenance, and prevalence of sexual reproduction is paradoxical because sexual reproduction costs more resources than asexual reproduction. Simulations can be used to explore hypotheses for why natural selection has not removed species that rely on the more costly reproductive method. One such theory suggests that host-parasite coevolution can result in selection for higher recombination rates in the host species [Hamilton 90].
Haploidy vs. diploidy: ploidy refers to the number of sets of chromosomes contained in a cell, haploid for one set and diploid for two. Most sexually reproducing species are diploid and it is unclear what advantages are conferred by having a second set of chromosomes.
Sex ratios in haplodiploid species: some species of social insects (bees, ants, termites) are referred to as haplodiploid because all males are haploid and all females are diploid. Predicting an evolutionarily stable sex ratio is non-trivial because there is a conflict between the queen and workers.
The extended phenotype: the theory of the extended phenotype [Dawkins 83] challenges the central theorem of modern ethology, that organisms behave in such a way as to maximize their inclusive fitness. Dawkins argues that there is no reason that the phenotype (manifestation, or influence) of the genes must stop at the organism's body. Perhaps an organism's behavior benefits a gene located in another organism's cells. Simulations modeling the effects of the extended phenotype would be of great interest.
Emergent computation is characterized by the following constituents:
1. a collection of agents, each following explicit instructions,
2. interactions among the agents which form implicit global phenomena at the macroscopic level, i.e. epiphenomena,
3. a natural interpretation of the epiphenomena as computations.
In nature, the evolutionary process occurs whenever the following four conditions are satisfied:
1. an entity has the ability to reproduce itself,
2. there is a population of such entities,
3. there is some variety among the entities, and
4. some difference in ability to reproduce in the environment is associated with the variety.
Both emergent computation and selective systems can be classified on a dimension representing the abstraction level of the system, from more abstract at one end of the spectrum, to more biologically plausible at the other end as shown in Table 1. Each type is discussed briefly in the following sections.
emergent behavior | selective systems | |
more abstract | cellular automata | genetic algorithms |
hybrid | neural networks | genetic programming |
more biological | neuronal networks | evolutionary programming |
Genetic algorithms
A genetic algorithm (GA) is a general search technique inspired by Darwinian natural selection. It simulates evolution in that the dual processes of variation and selection are used to explore an abstract search space, but in most genetic algorithms there is little emphasis on modeling natural evolution per se [Collins 92b].
A genetic algorithm works by generating an initial population of random bit
strings. The bit strings are evaluated by some arbitrary external fitness
function, and the ones that perform relatively better become more prevalent in
the population, replacing ones that do not perform as well. Variation is
introduced with genetic operators which mimic those found in nature. A more
detailed description of genetic algorithms can be found on p.12.
Genetic programming
Genetic programming is a specialization of genetic algorithms in which the bit strings are interpreted as computer programs. A wide variety of seemingly different problems from many different fields can be reformulated in terms of program induction, that is, searching for an algorithm that will compute some desired output when presented with particular inputs [Koza 92]. Examples include the following:
Optimal control involves finding a strategy that uses the current state variables of a system to choose values for one or more control variables in order to move the system toward some desired state while simultaneously minimizing some cost function.
Planning in artificial intelligence and robotics requires finding a sequence of actions that uses information from sensors about the state of a system to select effector actions which change that state.
Game playing involves finding a strategy that specifies what move a player is to make at each point in a game given known information about the game.
Empirical discovery requires finding a model that relates a given finite sampling of values of the independent variables and the associated values of the dependent variables for some observed system in the real world.
Induction of decision trees is one approach to the problem of classifying an object in a universe into a particular class on the basis of its attributes. A decision tree corresponds to a program consisting of functions that test the attributes of the object.
Automatic programming of cellular automata requires induction of the set of state-transition rules which will generate some desired global behavior.
Evolution of emergent behavior involves finding a set of seemingly simple rules which, when applied repetitively, lead to complex overall behavior. One example is the problem of finding a set of rules governing the behavior of an ant such that a colony of ants operating in parallel will work together to find all available food and transport it back to a nest.
Each of these problems can be attacked by applying a genetic algorithm to a
representation of a computer program. The GA operates on the programs as
passive data, generating a population of random programs to start, then
introducing variation by applying (possibly specialized) genetic operators. The
fitness function is evaluated by running each program in the population, and
comparing its output or behavior against the desired result.
Evolutionary programming
Evolutionary programming is a specialization of genetic programming with
special emphasis on biological realism [Collins 92b]. The bit string is
interpreted as a program that, when decoded and executed, creates an agent
which interacts in an environment. The fitness function is an implicit property
of the environment which often includes other evolving agents. In contrast to
most abstract genetic algorithms, the selection and mating process of the
artificial evolution GA includes spatial structure; a given agent has a higher
probability of mating with another in its proximity.
Cellular automata
Cellular automata (CA) represent emergent behavior in its most abstract form. CA are dynamical systems in which space and time are (usually) discrete [Culik 90, but see MacLennan 90]. The states of cells in a regular n-dimensional lattice are updated synchronously according to a set of deterministic local interaction rules. Each cell obeys the same set of rules and has a finite number of possible states.
CA are well-suited for modeling natural systems that can be described as
massive collections of relatively simple objects interacting locally with one
another. Since von Neumann first attempted to model a self-reproducing system
with CA, they have found diverse applications in modeling complex systems in
physics, chemistry and biology.
Neural networks
A great deal of attention has been given to modeling cognitive processes with emergent computation, often in the form of an artificial neural network. Artificial neural networks, like cellular automata, exploit the emergent computation of a collection of simple processors called formal neurons or simply units.
The units are generally simplified abstractions of biological neurons connected
in a directed graph. Each unit has a numerical activation value which it
computes from the values of neighboring elements using a simple formula. The
connections carry a numerical strength or weight. In an obvious neural
allusion, positive weights are called excitatory and negative weights are
called inhibitory [Smolensky 88]. A more detailed description of neural
networks can be found on p.17.
Neuronal networks
Neuronal networks are distinguished from neural networks by a more
biologically plausible neuron model. Realistic simulations are based upon
mathematical models of the biophysical processes underlying the operation of
nerve cells taking into account such attributes as resting potential, passive
and active membrane properties, and complex synaptic and dendritic interactions
[Beer 90].
Artificial life and artificial intelligence
The field of artificial life brings together the themes of emergent computation and selective systems into a study of human-made systems that exhibit behaviors characteristic of natural living systems. The fields of artificial intelligence and artificial life have much in common. Both reconsider conventional notions by removing them from a natural setting and placing them in a computational perspective.
It has been suggested that artificial life may represent a foundation for artificial intelligence in that the basic repertoire of abilities an organism needs to stay alive can be used as building blocks for higher level intelligence [Belew 91]. Ecologists have long recognized that the complexity of an organism's behavior is related to the complexity of the environment it must "solve". For instance, if food is a plentiful commodity, a random walk may be a sufficient foraging strategy. As food becomes more scarce, some form of perception and locomotion control is necessary for survival. If multiple food types are required, then resource-specific strategies and priority setting (planning) become necessary.
Artificial Life may be used to avoid some of the criticisms leveled against
artificial neural networks which have been criticized from a philosophical
perspective for being ungrounded and therefore meaningless [Dreyfus 88].
Computational neuroethology has been proposed as a method of eliminating the
ad hoc semantics of contemporary connectionist models [Cliff 90]. The
idea is to embed the connectionist model in a sensori-motor system. Future
models should be situated in closed-environment simulation systems; output of
the simulated nervous system is expressed as observable behavior.
Structure of the thesis
Chapter 2, "Survey", begins with more detailed descriptions of genetic algorithms and artificial neural networks. This provides background information for readers unfamiliar with those particular areas. Then an overview of many current evolutionary simulations systems is presented. Emphasis is placed on the aspects relevant to designing a general evolutionary simulation toolkit.
Chapter 3, "Analysis", develops a framework for comparing the systems surveyed in Chapter 2. Common elements are factored out into programmatic abstractions used as the basis for designing the library classes.
Chapter 4, "Design", details the process of applying a user-centered toolkit design methodology to the problem of a evolutionary simulation class library. The principles behind object-oriented programming and class libraries are also discussed.
Chapter 5, "Implementation", describes the sets of features provided by the SIMBIOSYS classes. Following the principles developed in Chapter 4, a set of classes is created to fit into a simulation architecture. This includes flexible abstract base classes and specialized derived classes for the following:
Chapter 7, "Conclusions", summarizes the contributions of the thesis and presents several areas for future work.