Population Viscosity and the Evolution of Altruism

Joshua Mitteldorf* and David Sloan Wilson**
*Leidy Laboratory of Biology
University of Pennsylvania
Philadelphia PA 19104
josh@forum.swarthmore.edu

**Dept of Biology
SUNY Binghamton Box 6000
Binghamton NY 13902-6000
dwilson@binghamton.edu

0. Abstract

The term population viscosity refers to limited dispersal, which increases the genetic relatedness of neighbors. This effect both supports the evolution of altruism by focusing the altruists' gifts on relatives of the altruist, and also limits the extent to which altruism may emerge by exposing clusters of altruists to stiffer local competition. Previous analyses have emphasized the way in which these two effects can cancel, limiting the viability of altruism. These papers were based on models in which total population density was held fixed. We present here a class of models in which population density is permitted to fluctuate, so that patches of altruists are supported at a higher density than patches of non-altruists. Under these conditions, population viscosity can support the selection of both weak and strong altruism.


Population Viscosity and the Evolution of Altruism
Joshua Mitteldorf and David Sloan Wilson

I. Introduction

Evolutionary altruism is defined to comprise traits carried by an individual which confer a fitness advantage upon others. Strong altruism actually imposes a fitness cost upon the bearer of the trait, while in weak altruism the disadvantage is only experienced relative to others. (Wilson, 1980) The emergence of altruism is a fundamental problem in evolutionary biology: How can nature select a gene that promotes the fitness of others, especially when it is at the expense of the bearer of the gene itself?

In order for an altruistic trait to be selected, the benefits of the altruism must fall disproportionately on other altruists. The present work applies computer modeling on a Cartesian grid to explore one very general paradigm: altruistic benefit is dispersed blindly to all occupants of a geographic neighborhood, while the limited rate of population diffusion serves to enhance the proportion of relatives of the altruist within the benefitted region. Previous work with this paradigm seemed to indicate that it would support weak but not strong altruism; but in the present model, we see both weak and strong altruism emerge. The new result depends on a variable overall population density; in particular, the communal benefit of altruism must be such that populations of altruists are supported at a local higher density than corresponding populations of non-altruists.

A number of theoretical frameworks have been developed to describe ways in which altruism can evolve, including multilevel selection theory, inclusive fitness theory, and evolutionary game theory. Sober and Wilson (1998) review these frameworks and their relationship to each other.

Game theory emphasizes the ways in which the benefits of altruism may be focused on other altruists through reciprocal exchange of benefit that recognizes and excludes (or punishes) freeloaders. The "Prisoner's Dilemma" game became one standard model for simulating the evolutionary viability of cooperative strategies in various environments. Axelrod and Hamilton (1981) pioneered computer simulation in this area, sponsoring a competition among strategies that anointed a victor in "Tit-for-Tat" (TFT), a strategy that cooperates or betrays each player depending on that player's most recent behavior toward the protagonist. The result was greeted with encouragement, in that TFT is a strategy that allows for the possibility of cooperation. Recent analysis (Nakamaru, Matsuda & Iwasa, 1997; Nakamaru, Nogami & Iwasa, 1998) has combined game theory with geographic effects to illuminate the distinction between benefits to survival and to fertility from cooperation. The game theory paradigm is limited in applicability to higher animals in which the brain is developed sufficiently to support the recognition of individual others.

Inclusive fitness theory is most often applied to situations in which altruists and the recipients of altruism are genetically related in a known way. In some circumstances, Hamilton's rule (Hamilton, 1964) provides an exact measure of the degree to which altruism can evolve. The rule states that the maximum degree of altruism that can evolve is directly proportional to the coefficient of relatedness among the altruist and the recipient of its behavior (b/c>1/r, where b=benefit to recipient, c=cost to altruist, and r=the coefficient of relatedness).

A related analytic approach borrows a technique from statistical analysis to describe clustering in terms of correlation coefficients for pairs, triples, and higher order geometric combinations. This method, first applied by Matsuda (1987, Matsuda et al, 1987) is most useful when the lowest terms can be shown to be an adequate approximation to the behavior of the full system. Van Baalen and Rand (1998) are able to place weak limits on the evolution of altruism by focusing on the pair correlation alone. But in the strongly clustered environments typical both of the bioshphere and its models, the pair correlation alone is of limited utility.

Multilevel selection (MLS) theory treats natural selection as a hierarchical process, in which the relative advantages of altruism and selfishness occur at different levels. For many kinds of altruistic traits, the levels may be associated with spatial scales. The advantage of selfishness is local - selfish individuals are more fit than altruists in their immediate vicinity because they receive the benefits without paying the costs. The advantage of altruism is realized on a larger scale - groups of altruists are more fit than groups of selfish individuals. MLS theory seeks to analyze the balance between the local and the global processes, and thus to predict circumstances under which altruism may evolve. For altruistic traits whose benefits are focused in a geographic locality, it is necessary that altruists cluster, so that the benefits conferred by an altruist are more likely to fall upon other altruists.

Analysis is simplified by the assumption of discrete and spatially separated groups; then the conditions favoring the evolution of altruism may be defined. First, the groups must vary in their frequency of altruists: the more variation the better. Second, the groups must be competing against one another, with some groups growing in size and others shrinking or vanishing. The relative strength of selection within groups (favoring selfishness) and between groups (favoring altruism) is determined by the relative timescales for extinction of groups to take place and for selfishness to evolve to fixation within groups. Third, the altruistic groups must be able to export their genes to the remainder of the global population. In the absence of a global dispersal mechanism the abundant progeny of altruistic groups remain in the same locality to compete only against one other. One ideal population structure for the evolution of altruism invokes periods of isolation, during which groups of altruists share their mutual benefit, alternating with periods of dispersal, allowing for the export of the altruistic groups' superior productivity. This scenario, which may be called "alternating viscosity", is explicitly assumed by trait group models in multilevel selection theory, and implicitly assumed by most game theory and inclusive fitness theory models. (Sober & Wilson, 1998, and references therein)

But nature provides abundant examples of population structures that do not alternate between isolation and dispersal, and for which neither explicit kin selection nor reciprocal exchange are natural models. In order to extend our understanding to encompass these phenomena, it is desirable to avoid explicit assumptions about relatedness, grouping and the timing of dispersal, and to allow all these concepts to emerge as a consequence of a general geographic structure. In our models, population viscosity "groups" are loose associations, where patches tend to be dominated for a time by one variety or another only because of the limited speed at which siblings disperse. These models resist analytic treatment. Application of Hamilton's rule encounters two difficulties: first, there is no easy way to gauge local relatedness; second, a global fitness measure becomes elusive when reproductive success depends on the strength of local competition. Computer simulation may be an appropriate tool for approaching this problem's irreducible complexity; in any case, simulation results can serve as a stimulus to analytic thought, and as one test of its verisimilitude.

A study by Wilson, Pollock and Dugatkin (1992) (WPD) indicated that computer models of population viscosity could support the selection of weak altruism, but that strong altruism was not competitive in this environment. Taylor (1992) reached a similar conclusion without computer simulation, analyzing a model in which discrete groups approximate the effects of viscosity. Independently, Nowak and May (1992, 1993) described an early grid model. Though they focused on the mathematical properties of their model to the exclusion of biological implications, their results also may be interpreted to permit weak but not strong altruism to evolve. Like WPD, they considered only fixed total population densities; but their cost/benefit scheme was structured in a somewhat different way, enhancing the prospects of altruists at low densities.

In the present paper, the WPD model is taken as a starting point. Altruism is modeled as having a cost c and a benefit b that contribute linearly to fitness. The cost is borne by the altruist alone, and the benefit shared equally by the altruist and its four lattice neighbors. Competition for each lattice site in each (non-overlapping) generation takes place among a group of five neighbors having a similar geometry. For the purpose of calculating altruistic contributions, each individual is counted as the center of its own neighborhood; thus every lattice site is surrounded by a local gene pool consisting of itself and four neighboring sites, with sites further afield affecting the competition indirectly via their influence on the fitness of the four neighbors.

WPD derive a version of Hamilton's rule appropriate to this model: the fitness of the average altruist in the global population exceeds that of the average non-altruist so long as

b/c > 1/V                                                                                                         (1

where V is a statistical analog of Hamilton's relatedness variable, r. Specifically, V is the average of the pair correlation coefficient over the 5 recipients of the altruist's benefit. One of these is the self, with correlation 1, and 4 are lattice neighbors, with correlation R, hence

V = (1+4R)/5                                                                                                    (2

(The pair correlation R may be defined by the usual statistical prescription, and can be shown to equal the difference between the probabilities of finding an altruist when looking adjacent to a random altruist and adjacent to a random non-altruist. This identity is detailed in Appendix 1.)

WPD found that the altruistic trait did not fare so well in their simulation as this version of Hamilton's rule would have predicted. They explained this result with reference to the lattice viscosity: the same limited dispersal that concentrated altruists in patches also impeded their ability to export their offspring to distant parts of the grid. The result may also be understood in terms of overlap between the group sharing an altruist's benefit and the group which competes directly for each site in the daughter generation: When the two groups are identical, the altruistic benefit applies to all competitors equally, thus it has exactly no effect. When there is no overlap between these two groups, the above version of Hamilton's rule predicts accurately the conditions under which altruism may prevail. In the present model, there is partial overlap, and thus altruism has some selective value, but not the full power predicted by Hamilton's rule. (This argument is made quantitative in Appendix 2.)

In the present work, we replicate the 1992 results of WPD and broaden their analysis by relaxing one key assumption: the requirement of constant population density. This modification addresses the issue of the altruists' excess productivity having no place to go, and does so in a way that is biologically reasonable, at least for some expressions of altruism. In principle, the number of occupants of each lattice site could be any integer, a free variable of the model; but we reserve this extension for future investigation. In our model, each site may be empty, or it may be occupied by a single altruist or non-altruist. Variable population densities may be observed over areas sufficiently large that the concept of density as a continuous variable has some meaning, but sufficiently small that they are approximately uniform in composition. The original rules of the WPD model have been extended in three different ways in order to support the presence of some empty cells in the long run, without causing the entire population to vanish.

With each of these three variations, we find that there are ranges of the parameters for which altruism is a viable competitor to selfishness, including some in which altruism evolves to fixation. Both strong and weak altruism may be supported under a broad set of assumptions. In fact, we find that the strong/weak distinction has little place in our understanding of the forces that determine whether altruism can prevail. We conclude that if the benefit of an altruistic trait enables a larger population density to support itself in a given environment, then with appropriate ratios of benefit to cost, that trait may evolve based upon population viscosity alone.

Note added in proof:  Similar conclusions to the above were reached via a very different path by Ingvarsson (1999).

II. The WPD Model

Two types of asexual individuals exist in the population, differing only in their altruistic (A) vs. selfish (S) traits. The populations are arrayed at lattice intersections of a two-dimensional Cartesian grid, 200 x 200 sites in most runs, with opposite edges connected to eliminate boundary effects. Each point on the grid represents a spatial location that can be occupied by a single individual, either A or S. (In the 1992 WPD model, vacancies were not permitted.) Interactions are local, such that the absolute fitness of an individual depends on its neighborhood of five grid points, which includes its own location plus the four neighboring points occupying adjacent north, east, south and west position on the grid.

WS = fitness of altruist = 1 - c + NAb/5
                                                                                                                                  (3
WA = fitness of non-altruist = 1 + NAb/5

where NA/5 is the average proportion of altruistic neighbors (the plus-shaped neighborhood of five includes the self).

Each altruist contributes b/5 fitness units to everyone in the neighborhood, including itself, at a personal cost of c. Selfish individuals enjoy the benefits conferred by their altruistic neighbors without paying the cost. Altruists decrease their relative fitness within their neighborhood when c>0 (weak altruism) and decrease their absolute fitness when c>b/5 (strong altruism).

The model proceeds in fixed, non-overlapping generations. Every site is vacated at the end of each generation, and a new occupant is chosen as a clone of one of five individuals in the neighborhood comprising the central site itself and neighbors to the north, east, south, and west. These five individuals compete for the site with probabilities proportional to their fitnesses as computed in equation (1). Note that five overlapping neighborhoods are used to compute contributions to the fitnesses of the five participants in the lottery that determines which will seed the central site in the new generation.

The structure of this model was suggested by a population of annual plants. A given area will support a finite number of such plants in each year, and all the plants in a neighborhood compete to seed the site at the center of that neighborhood in the coming year. The fitness values as computed in (1) correspond in this picture to the quantity of seeds produced by each variety of plant. All the seeds produced by plants in a given neighborhood form a local gene pool, from which only a single plant will survive to occupy a given site. The hypothetical altruistic trait permits all plants in a neighborhood around the altruist plant to generate more seeds.

                    SASSS
                    AASAA
                    ASAAS
                    ASASA
                    SSSAA

Fig 1: A sample segment of a saturated grid. A's are sites occupied by altruists and S's are non-altruists. The base fitness of each S is 1, and the base fitness of each A is (1-c). To this base is added a multiple of b contributed by the count of A's in the 5-site neighborhood (shaped like "+") centered on the target site.


Here is a numerical illustration based on Fig. 1: The position in the center is occupied by an A, which has two A and two S individuals as neighbors. If c=0.2 and b=0.5, then, from equation (1), the focal individual has a fitness of 1-.2+.3=1.1. The fitness of the four neighbors can also be calculated from equations (1), remembering that each individual occupies the center of its own neighborhood. Adding the fitnesses of A's and S's separately, we find 3.2 fitness units for A and 2.6 fitness units for S. These fitness units may stand for seeds in the neighborhood, or it is also convenient to think of them as tickets for a lottery. Since A's hold 3.2 lottery tickets to 2.6 for S's, the probability that the central site will be occupied by an A in the next generation is 3.2/5.8=0.552. Notice that the frequency of A among the offspring produced by the neighborhood (0.552) is a decline from the parental value of 3/5=0.600, which illustrates the local advantage of selfishness. The simulation repeats this procedure for every position of the grid to determine the population array for the next generation.

IIB-C Models with Variable Population Densities: Disturbance and Regrowth

Fitness of an organism has meaning only in relation to a particular environment. Enhanced fitness may manifest as a greater exponential growth rate in free population expansion, or as a greater carrying capacity in a saturated niche, or as a more robust response to any number of challenges. Our extensions of the WPD model are designed to permit the representation of such effects with minimal changes in the model's rules.

In the original version, every lattice site was occupied in every generation by exactly one individual. A minimal modification would permit some sites to be vacant. The rules governing the lottery then need to be generalized to allow for vacant sites (V's) as well as A's and S's. The simplest rule would be to assign a constant "fitness" h to the void; each V participating in a lottery would receive h tickets, independent of its surrounding neighborhood. The language describing V's as a separate species with its own fitness is a convenient mathematical fiction, akin to the symmetric treatment of negative electrons and positive holes in a solid state physicist's equations for a semiconductor. The rationale for this prescription is simply that if a neighborhood is less than fully occupied in the parent generation, there is a finite probability that reproduction will fail to create any occupant for its central site in the daughter generation.

In two-way competition between S's and V's, the V's have the upper hand when h>1, and the grid quickly empties out. For h=1, the population in two-way competition is marginally unstable, and eventually the grid evolves to saturation (all S) or extinction (all V). With h<1, the population grows always toward saturation, which would reduce the model to an exact replica of WPD. The first two possibilities did not appear promising; however the third, with h<1 and a population density that always grows, can be the basis for an interesting model, so long as a mechanism is added to replenish the population of void sites.

One way to maintain a population of V's on the grid is to introduce periodic disturbance events. On a fixed schedule, a percentage of all grid sites is vacated, as if by a disease or natural catastrophe, or a seasonal change that kills off a high proportion of individuals in the winter and permits regrowth in the summer. The geometry of the disturbance may be that a fixed proportion of sites is chosen at random for evacuation ("uniform culling"); or at the opposite extreme, all the sites in one area of the grid may be vacated, while the sites outside this region remain untouched ("culling in a compact swath"). Both these cases have been explored, and results are reported below in Section IIIB and IIIC, respectively.

For the uniform culling case, we choose parameters such that a high proportion of the population is culled, denuding the grid except for isolated individuals. This maximizes the founder effect (first described by Mayr, 1942), as regeneration takes place in patches that are pure A or pure S, because they are descended from a single survivor. As patches begin to regenerate, they expand into surrounding voids, and direct competition is delayed for a time. In the initial phase of regrowth, it is the competition A vs V and S vs V that dominates; only later do A's and S's compete head-to-head. But A communities, by our assumptions, can grow faster than S communities; hence the A's may compile a substantial numerical advantage before direct encounters (A upon S) become commonplace. This process is illustrated in Fig. 2. Detailed results for this variation are reported in Section IIIB.

a)b)

c)d)

Fig 2:  Red dots are altruists; green are non-altruists; white dots are empty sites.  Periodic culling of the population creates growth opportunities in which altruistic may outstrip colonies of non-altruists, though the altruists come out best in direct competition.
    a) The grid is initialized in a thoroughly mixed state, 1/3 each altruists, non-altruists and empty sites.
    b) A disturbance is introduced, destroying 95% of the population at random.  Regrowth has just started, and isolated individuals have created small patches of homogeneous type.
    c) Growth into the voids continues, with patches of altruists growing more rapidly than non-altruists.
    d) After 5 cycles of disturbance and regrowth, altruists have come to dominate the landscape.
With other parameter combinations, either altruists or non-altruists may
The other possibility we have explored is culling in a compact swath. A square centered on a random point and sized to include half the grid's total area is completely denuded in each disturbance event. Regeneration takes place from the edges, and regions of the boundary that, by chance, are dominated by A's expand most rapidly into the void. This process is illustrated in Fig. 3. Although the founder effect is evoked less explicitly in this variation than in the uniform culling case above, culling in a swath nevertheless proves to be a surprisingly efficient mechanism for segregating the population into patches of pure A and pure S. Detailed results for this variation are reported in Section IIIC.
a)b)

c)d)
 

Fig 3:  Disturbances that obliterate large patches of the grid prove to be an efficient mechanism for segregating the two varieties.
        a) The grid is initialized in a thoroughly mixed state, 1/3 each altruists, non-altruists and empty sites.
        b) A square disturbance is introduced, occupying half the grid area in which all life is destroyed.  Notice that after 5 timesteps, there has already been visible consolidation, with patches of altruists and non-altruists beginning to segregate.
        c) The void is re-colonized from the edges.  Patches of altruists are able to grow more quickly than non-altruists.  Meanwhile, areas away from the disturbance experience direct red-on-green competition, in which altruists are at a disadvantage.
        d) Toward the end of just one cycle of disturbance and regrowth, the disturbance has had two effects:  enhancing the fortunes of altruists, and helping to segregate into separate regions dominated by each variety.
       If disturbances continue on a regular schedule but centered at random locations on the grid, altruists may evolve to fixation.
IID. Models with Variable Population Densities: Unsaturated Steady States

In the above model variations, the fitness of the void h<1 was chosen in such a way that populations of A or S would grow and eventually eliminate voids in the absence of disturbance; disturbance events were inserted in the model in order to keep the result from degenerating into the saturated-grid case, with constant population density. These models correspond to population structures that are subject to seasonal cycling, or may be culled periodically by disturbances. Another case worth exploring avoids periodic culling. Voids are distributed through the grid and compete more equally with A's and S's, battling to a steady state distribution even in the absence of disturbances. Communities of A's and S's may have different efficiencies in using environmental resources, which is reflected in a lower average vacancy rate in areas dominated by A's than in corresponding areas dominated by S's. In order to model such situations, it is necessary to arrange parameters so that A, S and V can coexist in steady state. Fig. 4 illustrates a competition in which denser patches of A's are able to hold their own against sparser patches of S's.

a)b)

c)d)
 

 4:  This grid is initialized with 1/3 each altruists, non-altruists and empty sites, randomized pointwise so there are no local concentrations.  There are no large disturbances in this run, but rules of the 5-point lotteries have been modified, so that there is always a finite minimum probability that the site will be empty in the coming generation.  Some spontaneous segregation is already visible in (b), allowing colonies of altruists to pursue their local advantage.  By square (d), it is apparent that altruists fill their regions efficiently (98.7%) while regions dominated by non-altruists are 50% void.  By chance, voids areas sometimes appear as "mini-disturbances" near the borders between regions dominated by altruists and non-altruists, and when they do, the altruists expand rapidly into the empty region.  Even in border regions where altruist confront non-altruists directly, the altruists are arrayed along the border with twice the density of non-altruists, and this can vitiate the advantage of the freeloaders.  The altruists may come to dominate the competition.  In this example, altruists will evolve to fixation in several hundred time steps.
The simplest assumption about voids, h<1 or h>1, fail to generate the steady state that we are looking for. The case h=1 may work as a steady state for a short time, but it is unstable: the grid evolves eventually either to saturation (no voids) or extinction (all voids). One simple way to engineer a stable steady state is by introduction of a second parameter, which we call x. x represents an extra presence in every 5-way lottery, an acknowledgement that there is always a finite chance that seeding will fail and a site will become vacant, even if its neighborhood is saturated in the parent generation. Specifically, the rules of the lottery are modified so that in addition to the 5 competitors comprising the A's, S's and V's in a given neighborhood, there is always an extra measure of x lottery tickets assigned to V. Intuitively, it is clear that V can never become extinct, i.e., the grid will not become saturated, because each lottery has a guaranteed minimum probability of generating a V. But if h is small enough, then A's and S's will still have an advantage in each lottery sufficient to insure that they do not succumb to V in competition. In fact, we find that with proper choice of h and x, a stable steady state results, in which V's are always distributed through the population of A's and S's, and the populations avoid the extremes of extinction and saturation.

Exploration of this four-parameter system (b, c, h and x) is the subject of Section IIID below. The model supports the emergence of altruism in appropriate parameter ranges, and favors quasi-periodic population oscillations for other ranges. Overall, we have found it to be a rich lode of subtle and unexpected phenomena.

Although the full three-species system (S-A-V) displays such complex behavior, the two-component systems A-V and S-V are quite amenable to analysis. For a system with only non-altruists and voids, the population tends to a steady state with a density approximated by

rss = 1 - x/(5(1-h))                                                                                      (4

The corresponding system with just altruists and voids seeks a steady-state density approximated by the quadratic formula

          -B + Ö(B2-4AC)
rss ¾¾¾¾¾¾¾                                                                               (5
                    2A

where

A = 4b
B = 5(1-c-h) - 3b
C = -5(1-c-h) + x - b

These formulae are derived in Appendix 3. They have served us as guidelines for selection of parameter combinations to explore; specifically the relative advantage of altruist communities in ¡ss has proven a good predictor of the success of altruism in the simulated competition.

The four parameters b, c, h and x permit ample room to observe an interesting range of variations in the simulation results. We are most concerned here with strong altruism, c>b/5, where the constant-density WPD model found a consistent advantage for the non-altruists. When the average density rss for patches of altruists is significantly higher than the corresponding density for patches of non-altruists, we find parameter ranges where strong altruism can compete successfully. These results are detailed in Section IIID.

III Results

We have seen that much of the interesting detail relevant to the success of altruism is contained in the part of parameter space close to b=5c, the boundary that divides weak from strong altruism. Only for b<5c does the individual pay a net cost for its altruism, lowering its own absolute fitness in order to benefit its neighbors. In order to explode the region of parameter space near to b=5c for further study, we introduce the parameter g, defined as the quotient of the total benefit to others by the net cost to the self.

             4b/5                 4b
=   ¾¾¾¾   =    ¾¾¾¾                                                                                        (6
             c-b/5              (5c-b)

The numerator, 4b/5, is the part of each altruist's benefit that is exported to neighbors, while the denominator, c-b/5, is the net cost, since 1/5 of the benefit devolves upon the self. g=1 is an absolute lower limit in the sense that it is never possible for altruism to evolve when the net effect on fitness of self plus neighbors is negative. At the other extreme, the threshold of strong altruism is 5c-b=0, corresponding to infinite g. g is a convenient measure of the hurdle which selection must leap in order to select strong altruism.

In the results tabulated below, we focus on "Stalemate g" as a measure of each model's hospitability to altruism. Stalemate g is located as the end result of multiple trials, seeking that combination of b and c for which altruists and non-altruists have equal prospects, and for which the two populations may persist for many simulated generations. Alternatively, there were some parameter combinations for which the competition proved unpredictable and the victor in individual runs not well determined; for these cases, stalemate g was derived from a pair (b,c) for which A and S were equally likely to prevail. Typically five to ten trials were used to determine each value of stalemate g, and for many cases a smart trial and error routine was used to adjust b automatically (for fixed c) depending on whether A or S prevailed in previous trials.

IIIA. Replication of WPD model results

In Table 1, corresponding to Fig. 5, c was an input parameter, held fixed while b was varied until a stalemate condition was found between A's and S's, and g was computed from b and c. The last column, labeled "Neighbor R" is the autocorrelation coefficient for pairs of neighboring sites, and is the same R from equation (2) above.

Table 1
 

c
b
g
R
0.01
~0.05
large
0.54
0.03
~0.15
large
0.59
0.10
~0.50
large
0.66
0.30
1.475
240
0.55
1.00
4.78
91
00.51
3.00
13.6
43
0.49
10.00
43.9
33
0.47

Fig 5: Gamma is a measure of the minimum benefit/cost ratio required for strong altruism to be viable. The definition is such that all gamma<infinity correspond to strong altruism, but lower gamma means that conditions are more favorable for evolution of altruism. These results show that even with a saturated grid and no culling, strong altruism can become viable as cost and benefit are scaled up in tandem.
It was the primary conclusion of WPD that b=5c (g=¥) was a line of demarcation, at which altruism was marginally viable against selfish behavior; for larger b the altruists would evolve to fixation, and for smaller b, they would perish. Taylor's (1992) analytic model found the same boundary b=5c to be critical, and his derivation relies upon the assumption that b and c are both small. With larger grids and longer runs, we are able both to both to confirm that the rule holds for small b, c and to explore some of the region where the rule fails. When b and c are both small compared to 1, the minimum g to support altruism approaches infinity. But for larger b and c, altruism may evolve to fixation for finite g (even in a saturated grid). For c=1, altruism will prevail if b>4.78, corresponding to g>91. But if c=10, then b needs only be >43.9, corresponding to g>33.

Fig. 5 shows that the boundary between weak and strong altruism is not as fundamental as suggested by Taylor and WPD. In fact, we argue in Section IV below that the demarcation at b=5c is an artifact of the geometry chosen by WPD, and has no deep significance.

IIIB. Periodic uniform culling

In these runs, a high percentage of the population was culled at random on a regular schedule, such that much of the action consisted of regrowth from isolated founders into empty surroundings. No x variable is required for this model (remember that x was introduced in order to engineer a stable steady state population density less than unity). However, the "fitness of the void" h was an essential parameter of the model, and x was varied from 0.5 to 0.92. 95% of the occupied sites were randomly voided every 50, 100 or 200 generations; and, in separate trials, 50%, 75%, 90%, 95%, 96.5%, and 98% were culled every 100 generations.

In runs with deep population cullings, regrowth takes place from small, isolated patches that tend to be dominated by one variety or the other (the founder effect). Much of the competition is not directly A against S at their common border, but rather is a race for free expansion into unpopulated regions. Higher values of h imply slower growth rates for both varieties, but as h approaches 1, altruists are affected relatively less; this is because at h=1, non-altruist populations do not grow at all, but altruist populations may still grow for somewhat higher values (depending on parameters b and c). Therefore, higher h and deeper, more frequent culling are the conditions more favorable to the emergence of altruism, consistent with our model results (Figs 6, 7 and 8).

Fig 6: Dispersed culling results. 95% of sites are selected at random and vacated every 100 generations. Patches of altruists are able to regrow at a greater rate than non-altruist patches. The parameter ¢ controls the speed of regrowth. The slower the regrowth, the more effective is the disturbance for promoting the viability altruism.

Fig 7: Dispersed culling results. A fixed proportion of sites is selected at random and vacated on a regular schedule. More frequent culling creates a more favorable environment for evolution of altruism.

Fig 8: Dispersed culling results. In this figure, regrowth rate and frequency of culling are held fixed while proportion culled is varied from 75% to 98%. More severe culling creates a more favorable environment for evolution of altruism.


Three trends apparent in these charts are readily explained. First, increasing h while holding other parameters constant has the effect of increasing the viability of altruism, i.e., decreasing the stalemate g. This is because higher h makes the void a more formidable competitor, increasing the importance of the altruists' competitive growth advantage. Second, with h held constant, increasing the frequency of culling also makes altruism more viable. For the entire range of frequency values in the chart, there is ample time to regrow and fill the grid to saturation between cullings; but the runs with least frequent culling (200 generations) spend more than 3/4 of each cycle in a saturated, direct-competition phase, while those with the most frequent cullings (50 generations) spend almost the entire cycle in a growth phase. In direct competition, altruists are at a disadvantage, but during the growth phase, regions with high concentrations of altruists advance and spread faster. Thirdly, for similar reasons, increasing the culling percentage has a salutary effect on the viability of altruism. Thus stalemate g decreases monotonically with decreasing culling factor (the percentage that remain unculled).

A fourth trend presents more challenge: Comparing stalemate g for high and low c, we find some upward trends with c and some downward. (These are adjacent lines of data all through the tables.) Where viability of altruism is high, it tends to be higher when c is lower; but when viability of altruism is already low, it tends to be lowest for low values of c. The reason for the former trend is that raising c and b in parallel increases the growth rate for both A's and S's (b always overpowers c); hence more of each cycle is spent in the saturated phase. The latter trend is just what we observed in Section IIIA: for direct competition on a saturated grid, higher c and b makes altruism more viable. The reason for this remains unclear.
 

IIIC. Population culling in a compact swath

In a variation on the population culling model in Section B above, we specified that cullings cut a square swath across the grid in which all occupants are removed. The swath had area equal to half the total grid, but was randomly placed each time. (The swath was also permitted to straddle the grid's periodic boundaries, leaving behind a cross rather than a frame of occupied sites.) This mechanism corresponds in nature to damage from an extreme weather event, the outbreak of a parasite, or any other catastrophe which may devastate a compact geographic region, leaving outlying areas unaffected.

This scheme was found to be an efficient segregating force, creating within a few culling cycles very clearly-defined regions filled densely with homogeneous populations of type A or S (Fig. 3). Competition takes place both at boundaries between A and S regions, and also in the rate at which each group regrows into the voided swath. As in Section B, the frequency of culling determines the balance between these two forces. Because of the segregation, these runs were surprisingly hospitable to the evolution of altruism.

Note that, compared with Section B, we find a much lower stalemate g (greater hospitability to strong altruism) in runs with 50% compact culling than with 50% distributed culling. In fact, experiments with a compact culling factor of 50% are even more hospitable to altruism than are uniform culling models in which only 5% are left standing. Growth into a vacated region from the edges takes much longer than filling in an equivalent area of smaller, distributed voids, and hence is a more sensitive test of a population's ability to expand.

Two trends visible in Section B above may be detected here as well, and the same reasoning applies: First, increasing h with everything else held constant creates an environment more hospitable to altruism because the growth rate of non-altruist patches is reduced while the growth rate for altruist patches is less affected. Second, increasing the frequency of culling also enhances the viability of altruism, by increasing the fraction of the time that growth is taking place unimpeded into void regimes.

Another clear trend noted in our results is that the outcome of individual runs is less predictable in this section (with culling across a swath). In other words, the range of g values for which A or S may randomly prevail in a given run is largest with this paradigm. We speculate that the correlation between maximum effectiveness of group selection and minimum predictability is a broad trend. The problem of group selection may be stated: how can individual selection, which is much quicker and more efficient, be forestalled long enough for intergroup differences to show their effect? This formulation suggests that any stochastic effect, making the short-term outcome less certain, is likely to decrease the importance of individual selection relative to group selection.

Fig 9:  Block culling results.  A square swath corresponding to half the total grid area is vacated every 100 generations.  Results demonstrate the same trend as Fig 6.  Note  that much deeper culling is required in the dispersed case in order to have  comparable effect; this is because regrowth into scattered voids is much more rapid than into large, vacant blocks.  Conversely, culling in a compact block is much more effective for promoting the evolution of altruism.
 
 

Fig 10: Block culling results.  The same trend appears as in Fig 7:  More frequent culling creates a more favorable environment for evolution of altruism.
IIID. Elastic Population Densities in Steady State

A trait with positive selective value may either increase the rate of free population expansion, or else it may augment the steady state density at which a population may be supported in a given environment. Models in this section allow for variable population densities, with patches of altruists supported at a higher density than patches of non-altruists, but the population is not continually expanding as in the above variation. Rules of the lottery have been modified as described in Section II-B above so that voids (V) co-exist with A's and S's in steady state. Altruists will always lose in head-to-head competition with their selfish fellows but it is easy to arrange for altruists to have a greater steady state filling factor, so that regions dominated by altruism are more densely populated than regions in which non-altruists prevail.

Table 2
 

h
x
c
b
Theor.
Altr rss
Theor.
Self rss
Stalemate g
Observed r
0.80
0.20
0.01
0.0395
0.821
0.80
15.0
0.73
0.80
0.20
0.03
0.1225
0.856
0.80
17.8
0.76
0.80
0.20
0.10
0.428
0.920
0.80
24
0.82
0.80
0.20
0.30
1.35
0.967
0.80
36
0.97
0.80
0.20
1.00
4.61
0.989
0.80
47
0.97
------
------
------
------
------
------
------
------
0.90
0.20
0.01
0.0295
0.640
0.60
5.8
0.35
0.90
0.20
0.03
0.0880
0.709
0.60
5.7
0.38
0.90
0.20
0.10
0.3465
0.871
0.60
9.0
0.60
0.90
0.20
0.30
1.21
0.959
0.60
16.7
0.68
0.90
0.20
1.00
4.47
0.989
0.60
34
0.88
------
------
------
------
------
------
------
------
0.95
0.10
0.01
0.027
0.665
0.60
4.7
0.31
0.95
0.10
0.03
0.084
0.775
0.60
5.1
0.40
0.95
0.10
0.10
0.389
0.937
0.60
14.0
0.75
0.95
0.10
0.30
1.36
0.982
0.60
39
0.89
0.95
0.10
1.00
4.51
0.994
0.60
37
0.88

In Table 2, the left three columns are input parameters: h and x control the "fitness of voids". c is the gross cost of altruism, before the altruist's benefit to himself, b/5, is deducted. The critical b tabulated in the next column is the result of many computer runs searching for a b value which leads to stalemate, i.e., altruists and non-altruists may coexist for many generations or, alternatively, each type may prevail in the competition equally often. The next two columns hold theoretical steady-state filling values for altruists and non-altruists, as computed by equations (4) and (5) above. The ratio of these two is the driving force for the evolution of altruism. In the next column, g is the benefit:cost ratio corresponding to c and the critical b, as computed in equation (6). The last column is the observed grid filling factor at stalemate, the ratio of filled sites to all grid sites.

Two trends are clearly visible: As h rises, voids are relatively more competitive, and the advantage of altruism becomes more important. This can be seen by comparing each group of five lines (in which h is held constant) with the other two groups. Altruism can evolve at a lower g when h is higher, because the voids are a bigger factor in the competition.

The second trend is that within each group, critical g increases when both c and b are raised. The reason for this is less transparent, but we propose that it can also be understood in terms of the relative competitive position of the void. Larger c leads to larger balancing b, which increases the competitive advantage of both A and S with respect to V. The result is that the stalemated battle which defines critical g is fought in a grid with a higher filling factor.

Our results show generally that the addition to the model of elastic population density creates a hospitable environment for the evolution of altruism, even without major disturbance events. We find that altruism can prevail with g ratios as low as 4.7 (for comparison, the WPD adaptation of Hamilton's rule in this situation would call for g=2, while the WPD result for a fixed-density grid corresponds to infinite g). The prospects of the altruists improve with decreasing steady-state occupancy of non-altruists. Fig. 11 charts stalemate g as a function of the measured grid vacancy rate. The roughly exponential decline of g with increasing vacancy rate suggests that this variable, which is really a surrogate for population elasticity, is the primary factor responsible for the variance of stalemate g.

The original reason for including voids in the grid was to allow the modeling of elastic population densities. The population elasticity has proven to be a critical factor; indeed Fig. 11 suggests that population elasticity explains most of the variation in the viability of altruism within our model. Nevertheless, we have only begun to explore the population elasticity variable, with a dynamic range of only a factor 3: The present model allows each site to be occupied by at most one individual, so that maximum population density is 100%; minimum population density is about 30% because when parameters are adjusted for a density lower than about 30% the population is too fragmented to survive, as it is vulnerable to random extinction events. The minimum g=4.7 that we observe may well be surpassed in model variations that allow for a population variable at each grid site. This suggests a line of investigation for future research.

Fig 11: In Section D, evolution of altruism is made possible by the greater population density in
patches of altruists.  Since each site is occupied by either 1 or 0 individuals, there is room for this only to the extent that parameters allow for a steady-state vacancy rate in patches of non-altruists.  This figure demonstrates that the population elasticity is an important determinant of the hospitability of any parameter set to the evolution of altruism.  The y axis, on a log scale, is "Stalemate g", the ratio of communal benefit to net individual cost.  Low values of g indicate that altruism may emerge more easily.  The x axis is grid vacancy rate at stalemate, corresponding to  the potential for population elasticity.


IV. Discussion of Results

We have sought to construct models for altruism in which population structure emerges purely from localized interactions. We began with fixed-density models after WPD, where the finding was that weak but not strong altruism could be supported, so long as b and c were both small. It is tempting to seek general meaning in this finding, and to seek some correspondence to the well-known rule that weak altruism but not strong may evolve in a model where the altruist's benefit is scattered randomly on a large population.

It is a lesson of MLS theory that weak and strong altruism, as parts of a continuum, can be understood with a single theory. The question of whether either one can evolve depends upon the balance between within-group and between-group selection (Sober & Wilson, 1998). Our analysis reveals that the coincidence between the weak/strong altruism boundary and the minimum b/c ratio for the emergence of altruism does not signal any more general relationship, and that only the narrowest significance should be attached to this result. Not only do vacancies allow the evolution of strong altruism, but so also does increasing the absolute value of b and c (Table 1). Furthermore, if the rules of the grid are altered such that corners are included in each neighborhood, then the demarcation becomes approximately b=13c, though the boundary between weak and strong altruism is now at b=9c, for neighborhoods of size 9 (results not tabulated here). Thus the significance of the weak/strong boundary is not even independent of geometry.

We have generalized the WPD model to allow for variable population density, and found this indeed to be a crucial factor for the viability of altruism. We began with models in which groups of altruists, though unable to compete head-to-head against non-altruists, were nevertheless able to grow into a vacant habitat at a faster rate. Not surprisingly, models in which the populations were periodically culled, creating empty space to be repopulated, constituted a friendly environment for the emergence of altruism. We moved from there to model variations in which dispersed vacancies were built into the population, in such a way that clusters of altruists could exist in steady state at densities up to 3 times the corresponding population density for clusters of non-altruists. We saw that altruists were able to succeed in this model through a mechanism that was somewhat less transparent than that of free expansion: First, viscosity supports partition of the lattice into patches dominated by one or the other variety. The fitness advantage which altruists confer upon their neighborhood permits those patches dominated by altruists to establish denser populations. In the competition that takes place at patch boundaries, the greater density of altruists permits them to counteract the fitness advantage of non-altruists in close proximity, and in some cases to prevail.

Population viscosity models, because they assume only a two-dimensional geography and limited dispersal speed, are a formalism of great generality. We have seen that altruism may be supported in these models quite generally, and that the sorts of strong altruism that can be supported are such that the altruistic trait must contribute to a higher population density in patches of altruists than exists in patches of non-altruists. It is not difficult to think of examples of traits that have this property, and to contrast them with other kinds of altruistic traits that do not. A forest canopy is completely closed, and new trees can only grow up as old ones die out. A new variety of tree that produces more seeds than the old will take over such a forest via individual selection, assuming that each seed has the same chance of finding a vacant site as any other. However, an altruistic trait that enables all trees in a neighborhood to produce more seeds cannot prevail. But consider a trait that enables a tree to make more efficient use of available light, such that the same number of seeds is generated by a tree covering a smaller patch of sky. A single tree of this sort has no advantage over other trees in its neighborhood, since it produces no more seeds than they do. It cannot emerge in a grove via individual selection; however, a grove of constant area will support more such trees, and our viscous model predicts that this variety will come to dominate a forest through a group selection process.

Our model avoids imposing assumptions about genetic relatedness, population density and population structure; rather, these properties are allowed to emerge spontaneously from local interactions and geometry. We introduce as a measure of the viability of strong altruism the parameter g, equal to the ratio of total communal benefit to net individual cost of altruism. The lower the value of g at which altruism may sustain itself, the more hospitable we say a model is to the emergence of altruism. g is bounded from below at unity, since it is impossible under very general assumptions for altruism to prevail when its average impact on fitness of self and neighbors is negative. The smallest values of g to emerge from our model are between 3 and 4. These values are greater than predicted by Hamilton's rule (which ignores the variation in local competition) but may be highly significant for the evolution of group-level adaptations, which do not always require extreme self-sacrifice. (Sober & Wilson, 1998)

Evolutionary theory during the 1960's and 70's was informed largely by the great body of population genetics literature created and inspired by Fisher (1930) in the middle part of this century. The approach which Fisher pioneered treats population growth differentially, using continuous functions to approximate their discrete analogs. Large population size is implicit to this framework, and random mating is frequently invoked. In the ensuing decades, computers have become ubiquitous and large-scale modeling has become practical and generally available. The words "chaos" and "complexity" have emerged into common parlance, and the conventional wisdom has gained respect for the disparity that frequently arises between systems of discrete, random events and the continuous models used to approximate such systems analytically. (Wilensky, 1998) One of our findings (Section IIIC) is that stochasticity itself is a factor favoring the selection of altruism. If computer systems to support models such as the present one had been conveniently available in 1966, it is possible that they might have played a helpful role in the group selection debate, which otherwise was prone to abstraction. Perhaps the present availability of computer models treating group selection is sufficient reason to re-open that debate and re-examine its essential conclusions.

References



Appendix 1

Consider a grid fully occupied by NA altruists and NS non-altruists, for a total of
NA+NS=N occupants.

Let the number of A-A neighbor pairs be denoted NAA, and similarly for NAS and NSS.

The total number of neighbor pairs is NAA+NSS+NAS= 2N, divided as follows:

NAA +NAS/2= 2NA

NSS + NAS/2= 2NS

Now define r=NA/N as the proportion of A's in the grid, and let m=NAA/(2NA) be the probability of finding an A in a random site adjacent to another A.

Then the correlation coefficient can be defined in the usual way,

           <xy>-<x><y>
R =  ¾¾¾¾¾¾¾
             <xx>-<x>2

where x is any two-valued function, say x=1 at a site holding an altruist and x=0 for a non-altruist. y is then the same function evaluated at an adjacent site, so that <y> has the same meaning as <x>, and <xy> is the product of the function with itself, averaged over all adjacent pairs.

The choice of values 0 and 1 simplifies the computation, but the pair correlation so defined is independent of this choice. <x>=<y>=r follows immediately from the definition, and <xy>=ær is derived from counting the A-A pairs. Similarly, <xx>=r because it evaluates to 1 for each A and 0 for each S. Hence:

         m - r
R =  ¾¾¾¾                                                                                               (7
           1 - r

Alternatively, R can be defined as the difference in probability of finding an A when looking next to an A and looking next to an S. The former is æ by definition. The latter is

P(A|S) = (NAS/2)/(NSS+NAS/2)) = NAS/(4NS) = r(1-m)/(1-r)

So this alternative definition of R is seen to be

                      r (1 - m)                  m - r
R = m   - ¾¾¾¾¾     =      ¾¾¾¾
                   (1 - r)                     1 - r



Appendix 2

We modify the WPD model by randomizing the grid locations of all individuals once in each generation. This may be conceived as a zero-viscosity version of the model, in which the population is thoroughly mixed. Another way to characterize the difference from the original WPD model is that clustering of similar types has been eliminated, so that the neighbor correlation (Hamilton's relatedness r) is zero.

This simplified system may be approached quasi-analytically. We assume that the grid holds a random mixture of half A's and half S's, and ask, for a given c, what value of b will lead to a lottery result which replicates the equal population proportions. With the random mixing of each generation as specified, this must lead to a stalemate in the competition.

We note that there are just 10 types of five-site lotteries, and we analyze them exhaustively, then combine them in statistical proportions. The 10 different possibilities are:

     S   S   A   A   A
    SAS AAS AAS AAA AAA
     S   S   S   S   A

 S   A   A   A   A
SSS SSS ASS ASA ASA
 S   S   S   S   A

1 : 5 : 10 : 10 : 5 : 1

The numbers under each pair of diagram are their relative frequencies, taken from Pascal's triangle. Random shuffling in each generation assures that these frequencies will attain. Within each vertical pair, the proportions are further subdivided 1:4 or 4:6, reflecting the probability that an A or an S will appear at the central location. A full accounting of frequencies for the 10 diagrams is

1 : 1:4 : 4:6 : 6:4 : 4:1 : 1

The sum of the proportion numbers is 32, so we proceed using 32 as a denominator to seek a weighted average of the results of the 10 lotteries. To determine the average fitness of the lottery participants, we surround each diagram with individuals of type X, 50% of whom are A's. X's are reckoned as contributing b/2 to the fitness of each neighboring individual.

For example, consider this diagram:

  X
 XSX
XAASX
 XSX
  X

The central A has itself and 1 other A for a neighbor, so its fitness is 1+2b-c. The A in the wing has 3 X neighbors, one A and itself, for a fitness of 1+7b/2-c. The 3 S's each have fitness 1+5/2b by the same reckoning. So the result for this lottery is that the probability of A being victorious is

(1+2b-c) + (1+7b/2-c)                                 2+11b/2-2c
¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾  =   ¾¾¾¾¾¾¾¾           (8
(1+2b-c) + (1+7b/2-c) + 3(1+5/2b)              5+13b-2c

The above contribution is weighted by 4/32 and combined with 9 other expressions similarly derived to create an expression for the proportion of A's in the daughter generation.

The sum of 9 fractions (one is zero) with different denominators is awkward to treat analytically, but a Newton's method solution proceeds without difficulty. The expression is set equal to 1/2, and the system may be solved (numerically) for b for a given value of c.

The results are:

c=0.01, b=0.01666616
c=0.1,  b=0.166376
c=1,    b=1.641464
c=10,   b=16.274024

For small b and c, the sum of fractions can be linearized, neglecting quadratic and higher terms, to produce a simple expression in b and c, which can be solved completely by hand. The answer,

b=(5/3)c for small b,c

is apparent from the numerical results.

There is a short, convenient but less rigorous argument that leads to this same result:

The b that is exported by each altruist is partitioned as 2/5 that benefits direct competitors within that altruist's own lotteries and 3/5 which is scattered to other lotteries. (This comes from the fact that 1/5 of the altruists will be at the center of their groups, and all of their b goes to direct competitors, while 4/5 are located in the wings, where only 1/4 of their b
supports direct competitors.) In the average fivesome, there are 2.5 A's. Each receives 1 b from itself and 2b/5 from each of the 1.5 other A's, for a total fitness of 1+8b/5-c. The 2.5 S's in the fivesome each receive an average 2b/5 from 2.5 A's in the group, for an average fitness of 1+b. To specify that the lottery results in an A victory just half the time, we equate the averge fitness of A's and S's in the group: 1+b = 1+8b/5-c reduces to b=(5/3)c.


Appendix 3

Consider a grid filled with S's and V's (holes), thoroughly homogenized. The proportion of S's is r, and the proportion of V's is 1-r. The average lottery will include 5r A's, each with fitness 1, and 5(1-r) V's, each with fitness h. Additional lottery tickets numbering x are assigned to the V's. If the grid is in steady state, the lottery will result in an S a fraction r of the time. Equating the lottery odds to r, we have

r = 5r / (5r + 5(1-r)h + x)                                                                     (9

When this equation is solved for r, the result is equation (4), rss = 1 - x/(5(1-h)).
For a binary population of A's and V's, the same logic applies, but with shared benefit adding an extra ripple. The average A, surrounded by 4 cells, receives an altruistic contribution to its fitness of 4rb in addition to the b which it confers upon itself. As above, there are 5(1-r) V's, each with fitness h, plus a bonus of x. The steady state equation corresponding to (5) for altruists is:

              5r (1-c+(b+4rb)/5)
r = ¾¾¾¾¾¾¾¾¾¾¾¾¾¾¾                                                (10
      5r(1-c+(b+4rb)/5) + 5(1-r)h + x

Clearing the denominator turns this into a quadratic equation in r, whose solution is equation (5) for r.