For the final two weeks of class, you will build a simulation model of genetic drift. Genetic drift is important to evolutionary biology, plant and animal breeding, biotechnology, and biological conservation. We will focus on the conservation applications in this case, and as such we will focus on how genetic drift affects allelic diversity (which is important for a species' ability to adapt to its environment) and heterozygosity (which is important for avoiding genetic diseases, and for an individual's ability to fight off wide ranges of infectious diseases).
We will break the work down into four stages:
Initial setup of the spreadsheets, and simulation of a single run of 500 generations. This is your goal for this week.
Modification to run the simulation 100 times, collect data from each. This is the goal for next Monday.
Increase population size
Add immigration. These last two are easy, and can both be done on the last day of class.
You should be able to complete these four versions of the model by the end of the final day of class, on May 11th. When you've completed the simulations you will write up the results and submit your writeup and simulation models in place of a final exam for the class.
The model will be general, and is not meant to model a particular species. However, we have some choices to make about how to construct the simulation that need to be guided by biological characteristics. So, in a general sense, the "species" we are simulating will have the following properties:
Diploid (meaning there are two alleles per gene, one inherited from the mother and one from the father)
Sexually reproducing
Random mating
The model will be individual-based, meaning that our model will simulate individuals in the population in our model, and will be a simulation model, meaning that we will study the virtual individuals to understand how drift affects the gene pool rather than writing equations and finding the solutions for particular variables (our model would be analytical instead of stochastic if we analyzed equations to find solutions). Random changes in gene frequency (i.e. genetic drift) will emerge from the process of random mating between our simulated individuals, which will randomly select some individuals to produce offspring while others produce none. The fact that mates will be selected at random from the population will make the model stochastic, and each simulation run will be somewhat different from any other. Our model will thus be a stochastic individual-based simulation model.
Although we don't need to specify a particular species that our model
represents, the type of random mating we will simulate is easiest to
imagine from "broadcast spawning", which is common in
attached marine invertebrates like the coral polyps in the example
image. Species such as anemones are not capable of moving around to find
a mate, and both males and females release their gametes into the water
column, where fertilization occurs. There is little opportunity in
broadcast spawning for individuals to select mates, and random mixing of
alleles is the best model of how it works.
To keep the simulation relatively simple we will only concern ourselves with a single gene with 5 different alleles. Any single individual can only have two alleles, one of which was inherited from its mother and one from its father, but different individuals in a population can have different ones - the five alleles in the population will be labeled A, B, C, D, and E.
The simulation model will use Excel to store all the data, and to do much of the work of the simulation, including the random mating and calculation of summary statistics, such as allele frequencies, heterozygosity, and allelic diversity. We will write VBA macros to simulate change over time across generations, and to record results each generation of each model run.
The first thing we need to do is to establish our initial populations of individuals. We will only use two properties of each individual: sex and genotype.
We need to set up the genotypes for the parent population, which will establish the starting allele frequencies for the population's gene pool. We'll refer to these as the initial conditions, and each time we run the simulation we'll return the population to this starting point. First we'll set up the layout.
1. Open Excel to a blank spreadsheet. In cell A1 write "Parent population".
2. In cell A4 of the first worksheet (Sheet1) write "ID", and in cell A5 through A54 enter sequential numbers from 1 to 50 (you can use the "fill series" trick, or enter numbers 1 and 2, then extend them with the fill handle).
3. In cell C3 write "Males", and in cell C4 write "Allele 1". In cell D4 write "Allele 2".
4. In cell F3 write "Females", and in cell F4 write "Allele 1". In cell G4 write "Allele 2".
5. Double-click on the "Sheet1" tab and rename the sheet "Simulation".
6. Save the spreadsheet as a macro-enabled spreadsheet called "drift_simulation_n100" - use "File" → "Save as", and then make sure you select "Excel Macro-Enabled Workbook" from the "Save as type:" drop-down menu.
If you don't save
as a macro-enabled worksheet your macro won't be saved...don't let
this happen to you!
At this point your worksheet should look like this:
We will begin this population all five of the alleles (A, B, C, D, and E) at a frequency of 0.2, which means that each of the five will represent 20% of the total alleles in the population.
We can translate gene frequencies into genotype frequencies using the Hardy-Weinberg formula. Hardy-Weinberg translates allele frequencies into genotype frequencies by assuming that alleles combine at random, in proportion to their frequencies in the population. We can see how it works using a version of a Punnett square, with each parental allele in a row and column label, which then combine to form genotypes of offspring:
A (0.2) |
B (0.2) |
C (0.2) |
D (0.2) |
E (0.2) |
|
---|---|---|---|---|---|
A (0.2) | AA (0.22) |
AB (0.2)(0.2) |
AC (0.2)(0.2) |
AC (0.2)(0.2) |
AE (0.2)(0.2) |
B (0.2) | AB (0.2)(0.2) |
BB (0.22) |
BC (0.2)(0.2) |
BD (0.2)(0.2) |
BE (0.2)(0.2) |
C (0.2) | AC (0.2)(0.2) |
BC (0.2)(0.2) |
CC (0.22) |
CD (0.2)(0.2) |
CE (0.2)(0.2) |
D (0.2) | AD (0.2)(0.2) |
BD (0.2)(0.2) |
CD (0.2)(0.2) |
DD (0.22) |
DE (0.2)(0.2) |
E (0.2) | AE (0.2)(0.2) |
BE (0.2)(0.2) |
CE (0.2)(0.2) |
DE (0.2)(0.2) |
EE (0.22) |
Homozygotes are produced when the same allele is contributed by each parent, which means that if the frequency of the allele is p, then the homozygous genotype should be produced at a frequency of p x p, or p2 - there is only one way in the table for each homozygous combination to occur, so p2 is the final calculation of the frequency of the genotype. Since all of the alleles have a frequency of 0.2, this means homozygotes occur at a frequency of 0.04.
When two different alleles combine their frequencies are also multiplied - if the frequency of the first is p and the frequency of the second is q, then the frequency of the heterozygote would be pq. If you look at the table, though, each heterozygote occurs twice in the table, and must be added together. Equivalently, pq + pq is 2pq, so every heterozygote occurs at a frequency of 2 x 0.2 x 0.2 = 0.08.
There are 50 males in the population, and 50 x 0.04 = 2, so each homozygote will occur two times. Each heterozygote will occur 50 x 0.08 = 4 times. The genotype frequencies we need to use are shown in the table below.
|
|
1. With this information we can set up the initial population - start with Allele 1.
2. Set up Allele 2's. Now, use the "Number of Males" for each genotype to assign Allele 2.
3. The females are no different from the males genetically, so copy the genotypes for the Males and paste them to the Females. You now have a population with five alleles at equal frequencies, with exactly equilibrium numbers of homozygotes and heterozygotes.
If all went well the sheet should look like this (rows 4 through 26, at least - the last row will be 54, with an ID of 50):
Each row of this sheet represents a separate individual, with the genotypes of the 50 males in the population in columns C and D, and the genotypes of the females in the population in columns F and G. Since the model tracks the properties of individuals in the population it is individual-based.
4. Create a new sheet (click on the "Insert Worksheet" button, which is the plus inside a circle next to the Simulation tab). Double-click the tab at the bottom that says Sheet2 and re-name it "Original".
Copy the male and female allele frequencies from the "Simulation" tab (c3:g54, including the labels), and paste them in the "Original" tab, in cells a1:e52. We will use this copy to record the initial conditions so we can easily reset the population to this state by just copying and pasting the genotypes from the Original sheet into the Simulation sheet. At this point you should see:
5. Save your work. I'll remind you to do this at the end of each major step, but saving is a Good Thing, particularly after you have made a change and are sure it's right - feel free to save each time you successfully complete a step.
The next thing we want to do is to measure relevant characteristics of the population's gene pool as the simulation progresses. There are several different things we could measure, but the primary issues we would be interested in from a conservation perspective are changes in allele frequencies and changes in heterozygosity.
We will track the frequencies of the alleles over time, and keep a count of the alleles that are still present - the count of alleles is allelic diversity. If any allele goes to fixation that means it has reached a frequency of 1, and is the only allele left in the population. This is the always the expected end result of genetic drift, given enough time, and we are interested in how long it takes to reach fixation, if it occurs. If a single allele goes to fixation, all the others have gone to a frequency of 0 and are lost from the population. The allelic diversity is at its minimum point when this happens.
We're also concerned about heterozygosity, which is the proportion of individuals that are heterozygous for this gene. This is also an important measure from a conservation perspective, and although we know that it has to go to 0 if an allele reaches fixation (since all individuals have to be homozygous if there is only one allele in the population) it can decline to low levels before this happens. We expect the highest levels of heterozygosity when the allele frequencies are all the same, and as some alleles become more common (and others necessarily become relatively less common) heterozygosity will decline on average. It is possible for up to 100% of the individuals to be heterozygous, and random mating can cause heterozygosity to increase above what we would expect it to be, but on average we expect heterozygosity to decline as the allele frequencies move away from equal numbers.
1. Add another new sheet, which will be called Sheet3. Double-click on the "Sheet3" tab, and rename the sheet "Statistics". We will use this sheet to gather information from the simulation each generation.
2. First we want to calculate gene frequencies. In cell A1 type "Allele", and in cell A2 type "Frequency". Enter A, B, C, D, and E in row 1 of the columns to the right of "Allele".
Now, in cell B2, we will enter a formula that counts up the each letter for the male and female data in cells C5:G54 in the Simulation sheet, and divide by the total number of alleles (which is 200, since we have 100 individuals with 2 alleles each). You can do the counting with a"countif()" function - countif() will count up all the cells that meet a "logical test" that you specify. So, for example, to count up how many "A" alleles are in the population you would enter the formula:
=countif(Simulation!$C$5:$G$54, Statistics!B1)/200
This formula counts the letters in C5 through G54 if they are equal to
the letter in B1. This count is then divided by 200 to give us the
allele frequency.
You'll see that the sheet name has to be included in the cell references because we're referring to cells in two different sheets. This makes the cell references a little harder to read, but splitting the simulation from the statistics will make the model results easier to see. If you enter the cell references by switching to the sheet and selecting the cells with the mouse then the sheet name will be entered automatically. If you are writing the references by hand, you'll need to add the sheet name and exclamation point yourself. A reference without a sheet name is assumed to refer to the same sheet that the function is in.
Copy and paste cell B2 to cell C2 through F2 to get frequencies for alleles B through E.
Frequencies are proportions of a total - if you've done the calculation correctly each frequency should be 0.2 for each allele. As the simulation runs, the frequencies will change but will always sum to 1.
3. Now we will calculate the frequency of heterozygotes in the population.
In the Statistics sheet, label cell A5 "Male heterozygous", and cell B5 "Female heterozygous".
To count up how many males and females have different alleles using a single formula we need to make a pretty complex formula - let's build it in steps to make sure it's working.
Now that you have this for males, use the same formula for the number of female heterozygotes, which are in columns F and G of Simulation.
Finally, in cell G1 type "Heterozygosity". In cell G2 sum the number of male heterozygotes and the number of female heterozygotes, then divide by the total number of individuals (which is ... what?). To begin, 80% of the individuals are heterozygotes, so this should be 0.8 if you did everything correctly.
4. Now we will count the number of alleles present in the population (allelic diversity). This one's easy - we just need to count up the number of "Frequencies" numbers that are not equal to 0 in the "Statistics" sheet.
Label H1 "Allelic diversity", and in H2 use a countif() to count how many rows in B2:F2 of the Statistics sheet are greater than 0. The function is:
=countif(b2:f2, ">0")
The criteria in the second argument is in quotes, and says to only count the cell if the value is greater than 0, which all will be initially - you should get a 5 as your allelic diversity.
Your statistics sheet should look like this:
5. Save your work.
Now that we have frequencies of all the alleles, heterozygosity, and allelic diversity calculated for this initial generation, we need to set up a place to record all of this information as each generation passes.
1. Add another worksheet, and double-click on its tab to re-name it "Results".
2. Label A1 "Generation". Label the next five columns "F.A", "F.B", "F.C", "F.D", "F.E" to represent the frequencies for each allele.
3. Label G1 "Heterozygosity", and H1 "Allelic diversity".
Your Results sheet should look like this:
The layout is the same as the Statistics sheet to makes recording of the results easy.
4. Save your work.
You're now ready to simulate production of the next generation. We will do this first for the male offspring by doing the following:
Randomly select which males and which females will reproduce
Randomly select which of the two alleles each individual carries will be passed on to the offspring
This is done with spreadsheet formulas in the Simulation sheet.
1. Switch to the Simulation sheet.
Label I1 "Breeders".
Label J3 "Fathers" and N3 "Mothers".
Label I4 "Rand males" and M4 "Rand females".
Label J4 "Allele 1" and K4 "Allele 2".
Label N4 "Allele 1" and O4 "Allele 2".
2. We will generate random ID numbers to randomly select which males and females will breed.
You now have a random sample of the numbers of the males that will breed, some of which will breed repeatedly, and some of which will not breed at all (this is just like our bootstrapping examples).
3. Randomly select females to mate the same way as you selected the males. Generate random numbers in column M to select the female breeders, just like you did for the males.
4. Use a lookup() function to copy the alleles for each selected male into the Allele 1 and Allele 2 columns for the breeders.
You should now have a set of alleles for the 50 randomly selected males and females that make up the breeders.
5. Now we will mate the randomly selected males and female breeders to produce the male offspring.
The parents for each offspring are the randomly selected male and female in the same row. To mate them we now just need to randomly select which of their two alleles each parent passes on to their offspring.
You now have a set of randomly generated male offspring.
6. To generate the female offspring we don't want to just use the same parents, because we want every offspring to be produced by a different set of parents (this is a more realistic choice to simulate broadcast spawning, since there is no pair formation between the parents - their gametes are just randomly encountering each other in the water above them).
We could use another set of four columns to generate the fathers and mothers for the females, using a new set of randomly generated ID numbers, but we can accomplish the same thing without any additional columns of formulas...
Male offspring and female offspring are produced the same way, so the male offspring we've just generated could just have easily have been our female offspring instead. We will take advantage of this fact to:
The set of genotypes that were originally generated in the male Allele1 and Allele2 columns by spreadsheet formulas are now entered as letters (not formulas) in the female Allele1 and Allele2 columns. When you pasted the values the whole worksheet recalculated, so a new set of breeders are selected, which generates a new set of male offspring. As a result, you now have a set of male offspring generated from one randomly selected group of parents, and a set of female offspring generated from a different randomly selected group of parents without having to have a second set of breeder columns for the females.
You've completed the first generation of mating - your sheet should look like this (columns A through G will have the same entries, bu I through V are generated with random numbers, and will not have identical ID or allele entries):
6. Everything is set up, we just need to simulate some change over time now. First we'll complete the process of a change in generation manually, and then we'll write a macro to complete these steps for us in the next step.
And with that you've started a new generation. Since the offspring generation from the first round just became the parent population for the second round all of the allele frequencies have changed - if you switch to Statistics you'll see they aren't all 0.2 any more, and heterozygosity will have changed as well (it may have gone up or down a little, but will probably not be 0.8 anymore). Allelic diversity almost certainly will still be 5 - there are 20 copies of each allele in the population, so unless none of the 200 randomly selected parents used to produce the 50 male and 50 female offspring had one of the alleles all five will still be present. As you run through the simulation for multiple generations alleles will be lost, but it will take several generations.
If we were to do this repeatedly 500 times we could simulate 500 generations of drift due to random mating. That's way too tedious, though - good work for a computer, bad work for a person. We'll write a macro that does these steps repeatedly so that we don't have to.
To put everything back to initial conditions switch to Original, copy the male and female alleles from A2 to E52, switch to Simulation, and paste them over the parent population's alleles (you don't need to paste-special, they're not formulas).
Save your work!
Now you're ready to record a macro that you'll use as your first simulation. There are several steps that you'll need to record, and as you've found out in previous exercises, every mouse click and window scroll is recorded once you turn on the macro recorder. Read through the numbered steps below a couple of times so that you have them in mind before turning on the macro recorder.
1. Switch to the Simulation sheet. Go the the View tab and turn on the macro recorder. Call the macro "DriftSimN100", and give it the shortcut key CTRL+SHIFT+D. In the Description write "Genetic drift simulation with no immigration and a population size of 100". Click OK to start recording.
2. Switch to the Statistics sheet. Select B2:H2, and copy the cells (the allele frequencies).
3. Switch to the Results sheet. Select cell B2, then right-click and paste-special as values.
4. Switch to the Simulation sheet. Select cells R5:S54 and copy them. Select cell U5 and paste-special as values.
5. Still in the Simulation sheet, copy all of the offspring in cells R5:V54, select cell C5 and paste-special as values.
6. Turn off the macro recorder.
7. Save your work.
If you open your macro for editing it should at this point look (more or less) like this:
There are a couple of unneeded lines that we'll get rid of later. The important things to make sure your macro has are:
As long as you can identify all those steps in your macro you're ready to go on.
You are now ready to add a loop.
1. Add a For...Next loop that encloses the entire macro recording - put the For after the comment lines, and the Next before End Sub. Have the for loop iterate from i = 1 to 500.
2. Now you need to edit a cell reference to make sure the allele frequencies, heterozygosities, and allelic diversities get put in the right place each generation.
Look for the line in your code where you change to the "Results" sheet. Once you're in Results you should have selected cell B2 as the location to paste the statistics you copied, but we want this to increase by one each time we run through the loop, so change this to:
Range("B" & i + 1).Select
Now when the paste-special as values happens each generation the results will go into a new row.
3. We are not yet recording the generation number, so we need to add this command (not recorded by the macro recorder, so this will be a new line):
Sheets("Results").Range("A" & i+1).Value = i
Since you're using i as the counter on your For...next loop you can use it to record the generation number. The first time through the loop i is equal to 1, so this will assign a 1 to cell A2, the second time through a 2 to A3, and so on.
4. We should do a couple of housekeeping tasks that will need to be done each time the macro is run.
Before the simulation runs we should make sure that we're at initial conditions. Add a blank line before your For loop starts, and add the command:
Sheets("Simulation").Range("C5:G54") = Sheets("Original").Range("A3:E52").Value
The other thing we should do is clear any old simulation runs from the Results sheet to make room for the run we are about to do. On the next line, still before the For loop starts, enter:
Sheets("Results").Range("A2:H501").Clear
Save your work before going on to the next step.
Excel is primarily meant to be used in an interactive mode, and any
change in the contents of the sheet cause the whole sheet to
recalculate, and the sheet display is redrawn so you can see the result
of the change. However, when you are running a macro all of this visual
feedback flashes by too fast to see anyway, yet it slows down the macro
dramatically. It's a good idea before you run the macro to turn off the
visual screen updating, and then turn it back on when the macro is
finished.
Make a blank line after the comment lines but before the Sheets("Original")... line, and add the command:
Application.ScreenUpdating = False
Turn screen updating back on by putting the command:
Application.ScreenUpdating = True
just before the EndSub.
If you haven't done so already, debug - arrange the code editor and the spreadsheet so you can see the Results sheet and hit F8 repeatedly to run through the simulation, at least twice through the loop to make sure it works consistently.
Then switch to the Simulation sheet and make sure the copying and pasting is working right - you want to confirm that the male offspring are copied and pasted to the female columns first, then the male and female offspring are copied and pasted to become the parents each time through the loop.
Run the macro, interpret the results
Once you're confident everything is working hit CTRL+SHIFT+D to run the simulation.
Depending on how fast your computer is this could take anywhere from less than a minute to several minutes. If you debugged successfully and you see a busy cursor don't interrupt. If it has been more than five minutes let me know and we'll check on it.
When the simulation has finished, the Results tab will have all the gene frequency, allelic diversity, and heterozygosity results for all 500 generations. Now it's time to interpret the results.
A good way to explore a simulation like this is to graph the results over generations.
You'll want a separate graph for allele frequencies, heterozygosity, and allelic diversity because the y-axis is different (especially for allelic diversity). Since generations are all equally spaced, you can use either a line graph or a scatter plot with lines connecting and they will look the same. Title each one appropriately, and label the axes.
Most of you will have a single allele become fixed - the allele frequency will go to 1 for one of the alleles, and all the other alleles will be lost from the population (i.e. their frequencies will go to 0). If this happens, you'll see that although there is initially variation in the allele frequencies, with both some increases and decreases in numbers, each allele drops out of the population one at a time, eventually leaving four of them at a frequency of 0 and one at fixation (i.e. at a frequency of 1). Make note of which allele is fixed for your population.
Since the only thing affecting these frequencies is random chance, we would expect that which allele goes to fixation is also random - if you run the model repeatedly you should get different alleles going to fixation. If you run the macro again there's a good chance a different allele will become fixed.
Stage 1 complete!
Congratulations! You're done with the first model, which simulates a single run of 500 generations - this is the hard part, the rest of the models are modifications of this initial one (some very minor modifications at that).
Time to move on to the next model, which will repeat this process 100 times.
We can learn a lot about how the process of genetic drift works by looking at the results from a single run of 500 generations - you can see the unpredictability of changes in frequency over time, for example. Some of you may even have a case in which an allele drops to low frequency and then goes on to increase to fixation.
If you ran the simulation a second time you'll see that the patterns are different each time you run the simulation, and which allele went to fixation was probably different in the two runs. In fact, we would expect to get a different random pattern of change in gene frequencies every time we run the simulation.
The implication of this unpredictability is that we can't rely on a single run to tell us what to expect, because the answer we get will depend on which run we look at.
We know, however, that random outcomes (like tossing coins or rolling dice) can be completely unpredictable at the level of a single trial, but become highly predictable at the level of multiple trials. A single coin toss will either land Heads or Tails, and we can't predict in advance which it will be, but if we toss the coin 100 times we can expect to get about 50 Heads and 50 Tails.
Similarly, if we run the simulation 100 times we can see what typically happens, so that we don't have to rely on a single run - the average time to fixation across 100 runs will be a better predictor of the expected effects of drift than any single run could be. We will also get a fuller set of possible outcomes to look at, and we may find that some runs go to fixation vary quickly while others don't go to fixation at all - we can use the full set of runs to tell us what is typical, and what the range of possibilities are.
Given this, we need to modify our program so that it will run the simulation repeatedly and record results from each run.
Before you start, use "Save as..." to save a copy of your spreadsheet, and call it "drift_sim_n100_time_to_fixation". Use this new version of the sheet for this part of the exercise. Make sure to still use the macro-enabled spreadsheet option so your macro will be saved.
Turning off screen updating is probably the single most effective thing we can do to speed things up, but now we need to squeeze as much speed out of our macro as we can. A single run that takes 1 min to complete is not a big problem, and the time we would spend programming it to run faster could easily exceed the time gain we get. But, if we repeat a simulation that takes 1 min to run 100 times then we wait for 100 min, which is longer than a class period. The time we put into making the code fast will pay off as we move into the second, third, and fourth models, each of which repeat the drift simulations 100 times.
1. A version of the model that runs in about 2 seconds on my computer is here:
The green comments above each line of code mostly explain what's going on, but note a couple of things:
The one command that might be a little hard to understand is the first one in the loop - let's break down what it's doing:
More speed - switch to a Do...while loop
We can also speed up a lot if we don't continue the simulation after one of the five alleles has gone to fixation - that is, if one allele is the only one present there won't be any additional change in allele frequency over time to simulate, and there is no point in continuing the remaining generations at that point.
The version with a Do While... loop looks like this (comments explaining the changes from the previous version in green, and explained further below):
The changes are:
DON'T RUN THIS MACRO - this will not work yet, because we don't have the formula in cell K2 of Statistics that's testing if we're all done yet. We need to add that now before running the macro.
2. To add a test of whether any of the alleles has gone to fixation to the Excel file we need to:
If xlookup was available when we used it in previous labs then do the following:
You now have a cell formula that checks if an allele has gone to fixation, reports the allele if it has, and reports "None" if not.
If xlookup did not work for you then you can use a more complicated option - ONLY DO THIS IF YOU'RE USING AN OLDER VERSION OF EXCEL THAT DOESN'T HAVE XLOOKUP
- Copy the Allele label and the five allele numbers from A1 through F1 and paste them into row 10 (A10 to F10)
- In G10 enter "None"
- In Row A11 enter "Frequency"
- In cell B11 enter the formula =b2, and then copy B11 and paste it to C11 through F11 - each cell should point to the appropriate allele frequency
- In cell G11 enter the number 1
- In cell I2 enter the formula =choose(match(1, B11:G11, 0), "A", "B", "C", "D", "E", "None")
The match() function finds the first occurrence of a 1 in cells B2:G2 and returns its relative position in the range of cells entered as its second argument (if the first occurrence of 1 it finds is in cell B11 match returns 1, because B11 is the first cell it's told to check for, but if it's in C11 it returns 2, and so on). The final argument, 0, just tells match() it has to be an exact match. The position number from match() gets passed to choose(), which uses it to pick the correct allele from the list of alleles that follow the match() function. Since match() finds the first matching value, it will continue to return "None" as long as none of the alleles goes to fixation, but once one of them does its frequency goes to 1, and match() will return that allele. Only a single allele can go to fixation at a time, so this will always return the only allele with a frequency of 1.
3. Now we will enter the formula in K2 that tests if we are either at generation 500 or are one of the alleles has a frequency of 1.
The formula in K2 checks is an or() function, which returns TRUE if either argument is TRUE, and FALSE if both are FALSE. The two things that are being checked are whether whether either I2 has changed from "None" to something else (which would be the letter of a fixed allele), or if we have completed 500 generations. If either of these conditions are true cell K2 will become "TRUE". Each time through the Do...While loop we check if cell K2 has become TRUE, and if it is we can stop the Do...While loop.
Run the macro to check that the Do While loop is working
If you got the worksheets set up right, and updated your macro to match the one above you can run it by switching to the Excel file and hitting CTRL+SHIFT+D - it should finish in a couple of seconds.
If you run it repeatedly with the Statistics sheet selected you'll see that an allele goes to fixation most runs, but which one goes to fixation is unpredictable. The amount of time it takes to get to fixation will change (J2 will show a different generation between runs).
If you aren't getting the expected results, or if it takes more than a few seconds for the macro to run, check with me before you go on.
Now that we have each simulation running about as fast as possible we will add another loop that repeats the simulation 100 times. The statistics from individual runs won't matter so much anymore - this time we want to focus on the typical time to fixation and allele that becomes fixed across the 100 runs, so we will record the generation number at the end of the run (which will either be the generation at which fixation happened or 500 if fixation didn't occur at all), and the allele that went to fixation (which will be A, B, C, D, or E, or None if fixation didn't occur). We will then be able to see how often fixation occurs, how long it typically takes for fixation to occur when it does, and confirm our impression that which allele goes to fixation is unpredictable.
To do this we will now:
1. We can repeat the simulation 100 times by adding an additional For..Next loop that encloses the Do While loop. The macro that accomplishes all of this is (changes in green, explanations below):
The changes are:
Once you've finished updating your macro you just need a little more labeling before you run it...in the Results sheet, label J1 "Run number", K1 "Allele fixed", and L1 "Generation".
2. You're going to want to run through the simulation a few times in debugging mode to make sure everything is working - it takes a couple of minutes to run this thing on my computer, and may take longer on yours, so you don't want to wait for 5-10 minutes for the simulation to finish and only then find out something wasn't programmed correctly. That's just frustrating.
You should be fairly confident that the Do While loop is working, so primarily what you want to check is that the year of fixation and allele fixed is recorded each time through the For...Next loop, and that everything gets reset properly between the end of one run and beginning of the next. So, do the following:
2. Once you're confident that everything is working, run the simulation! It takes about 2-3 minutes on my computer. As long as debugging went well, and the progress indicator is showing progress, let it run until it finishes.
When the simulation completes you'll have all of the allele frequency data you had previously for the final run (run 100). Additionally, you'll have the time to fixation for all 100 runs, which are in columns J, K, and L. You'll want to know a few things:
Congratulations! Save your first two files someplace safe, you're going to make modifications of this basic model for the next two steps.