Genetic drift simulation in R

Consider how the model is working

As you are working through the models, start thinking about how our modelling decisions are representing the process of a population passing on its genes from parents to offspring over time. Some of the choices we are making are not terribly faithful representations of how a real biological population would work. Ask yourself questions such as:

Is population size really going to stay exactly the same in a real population over 500 generations? If not, why are we keeping it constant in our models?
What kind of organism might the simulation best represent? Broadcast spawners are aquatic animals that release sperm and eggs into the water, where fertilization occurs - is this the kind of organism we are simulating?
Is the way we are selecting males and females to breed realistic? Are every male and female in a population equally likely to mate with one another?
Do most species breed and then die, with no overlap in generations? So all offspring produced usually survive to adulthood?
There is no natural selection in these models, only drift. How might the results be different for genes that are subject to natural selection?

But, simulation models have some distinct advantages that justify their use.

Simulation models allow us to run experiments that aren't possible in real populations. We may be able to do field work that tests whether gene frequencies are varying over time in a population with random mating, but a single population will only have a single pattern of change over time. Our ability to run multiple identical runs to see what happens in general is tremendously beneficial. Additionally, many processes we want to study take a very long time to complete, and we can't study them in real time - a species that takes a year to mature and reproduce can't be followed over 500 generations by any one researcher (and no studies that I'm aware of have been continued for 500 years).
Simulation models allow us to isolate variables. Being able to isolate variables by holding everything else constant is a very important principle of experimental design, and we have an unrivaled ability to isolate variables in a simulation model. For example, we're holding population size constant, knowing that in real populations it will usually vary from generation to generation. Although a constant population size may not be very realistic, if we were not able to hold population size constant we wouldn't be able to tell how much of the variation in allele frequency was due to random mating and how much was due to fluctuation in population size over time.
We can explore the consequences of the things that we know. We know that when there is random mating we expect drift, but how does an increase in population size affect the rate of loss of alleles to drift? What happens if instead of a closed population (with no immigration or emigration) there is immigration into the population?

As long as the way the model works is not so completely unrelated to how biological systems work that we will be misled by using them, it can be very useful to use models to help us understand real populations.

You can address questions like this in the Discussion of your papers when you're talking about potential problems with the model, or limitations on what we can learn from it.

First variation: Double the population size

Download this file into your project folder and open it to continue with the last two simulations.

The smallest number of each sex we can use while keeping all 5 alleles at their HW equilibrium is 25 apiece. To make it easy to set the population size we can edit the run.sim() function to take an argument that specifies how many sets of 25 males and 25 females to use.

Copy and paste the run.sim code from the first version of the model into the siumulation.run.function.popsize code chunk.

Then, change the first line to accept an argument called multiple, with a default number of 1, like so:

run.sim.popsize <- function(multiple = 1) {

When the function is run, we will use run.sim.popsize() as the command, and if we leave the parentheses empty the default value of 1 will be used. If we specify a number (such as 2 to double the number of parents, or 3 to triple it, etc.) the numbers of individuals will be adjusted accordingly.

To use the multiple argument, make space for these lines next, and enter them just below the run.sim.popsize function definition line:

n.males <- 25 * multiple

n.females <- n.males

n.parents <- n.males + n.females

This calculates the number of males to be used by multiplying 25 by the multiple, then sets the number of females to be the same as the number of males. The number of parents is then just the sum of the number of males and number of females.

The next place we need to adjust is making the males from the list of five alleles. We make the vector of alleles as before, and use expand.grid() to make the initial 25 of them. We then need to change the command:

males[ rep(1:25, 2), ] -> males

to read:

males[ rep(1:25, multiple), ] -> males

Replacing 2 with the value we've entered for multiple when we run the function allows us to pick different population sizes.

The next place we need to update is the part of the loop where we make the male and female offspring from the male and female breeders. Now that the number of males and females can be different from 50, we need to change the hard-coded values to be one of the variables we defined above for n.males, n.females, or n.parents as needed.

Find the commands:

males[sample(1:50, 100, replace = T),] -> male.breeders
females[sample(1:50, 100, replace = T), ] -> female.breeders

and change them to:

males[sample(1:n.males, n.parents, replace = T),] -> male.breeders
females[sample(1:n.females, n.parents, replace = T), ] -> female.breeders

Using n.males, n.females, and n.parents like this allows all of these values to change when multiple is set to a different number.

Similarly, find the lines:

data.frame(Allele1 = offspring.a1[1:50], Allele2 = offspring.a2[1:50]) -> male.offspring
data.frame(Allele1 = offspring.a1[51:100], Allele2 = offspring.a2[51:100]) -> female.offspring

and change them to:

data.frame(Allele1 = offspring.a1[1:n.males], Allele2 = offspring.a2[1:n.males]) -> male.offspring
data.frame(Allele1 = offspring.a1[(n.females+1):(n.parents)], Allele2 = offspring.a2[(n.females+1):(n.parents)]) -> female.offspring

The rest of the commands are fine the way they are, because they don't use the number of males, females, or parents as an argument.

Once you have made these update, run the chunk to get the command into your environment.

Then, in chunk lapply.repeat.100.times.popsize, use an lapply() command to run the run.sim.popsize function 100 times. Enter a 4 as an argument (we used 50 of each sex the first time using a multiple of 2, so to get 100 of each sex we need to use a multiple of 4), and assign the output to sim.output.popsize.

The method we're using to get the generation of fixation still works, but with a larger population size we will not always have fixation by the 500th generation. The first command, which gets the generation of fixation, is the same as before, but use sim.output.popsize as the first argument in the sapply() function, and assign the output to gen.fixed.n200 (in the sapply.to.get.fixation.gen.popsize chunk).

For any run that didn't get an allele going to fixation we'll get a value of "Inf" (for infinite) in the output. We can check for which runs had an Inf value in the output, and assign them a value of 501 as the generation fixed. In the same code chunk (sapply.to.get.fixation.gen.popsize), below your first sapply() command, enter another like so:

sapply(gen.fixed.n200, FUN = function(x) if(is.infinite(x)) {x = 501} else {x = x}) -> gen.fixed.n200

You'll This uses an if() function to check if an entry in gen.fixed.n200 is infinite, and replace it with a value of 501 if it is. If it is not, the else statement says to replace the generation fixed with itself (this seems to be necessary, as if you don't put something in x it gets deleted).

Similarly, our method of finding which allele want to fixation is the same as before, but if one allele never went to fixation we want to detect that and record it properly. In the chunk sapply.to.get.allele.fixed.popsize enter the same command you used before, but use sim.output.popsize as the argument and assign to an object called allele.fixed.n200.

Then, on the next line, enter:

sapply(allele.fixed.n200, FUN = function(x) if(length(x) == 0) {x = "None"} else {x = x}) -> allele.fixed.n200

The idea here is very similar to what we did to assign 501 as the generation fixed when no allele went to fixation, but now the indication that no allele is fixed is that the output for the run in allele.fixed.n200 is empty. An empty element has a length of 0, so we can compare the length of the element to 0 and assign a value of "None" as the allele fixed. Otherwise we retain the allele letter that did actually went to fixation.

You can now run through the rest of the steps - combine the generation fixed and allele fixed for each run into a data frame. Get a mean generation fixed, tabulate how often each allele went to fixation (and how often none did), and graph the boxplot for generation fixed.

Second variation: immigration

When an individual moves to a new population, it brings its genes along with it. Immigrants can thus restore alleles that a population lost to drift, and may be able to prevent alleles from being lost in the first place by increasing the frequencies of rare alleles.

Immigration has a smaller effect on big populations than on a small population, because a single pair of alleles is a smaller contribution to a big gene pool than a small one. Big populations also lose alleles to genetic drift more slowly than small populations do (as you should have seen with your n200 model, above). These two factors offset one another perfectly in theory, such that a single immigrant per generation is expected to prevent loss of alleles to drift regardless of the size of the population. This next version of the model will explore whether 1 immigrant per generation is enough to avoid loss of alleles in our simulated population.

1. Copy and paste the function from simulation.run.function.popsize into chunk simulation.run.function.immigration. We'll make the needed changes to add immigration to this function.

2. Change the first line to:

run.sim.final <- function(multiple = 1, immigration = FALSE) {

This adds in an argument to our function that allows us to include immigration. The default is to have no immigration, but if we enter TRUE as the second argument we'll get one male immigrant per generation added to the breeding population.

3. The only other change we need to make is to replace one of the male breeders with an immigrant each generation. So, in the lines after the males and female breeders are selected, but before they produce offspring, add this if() command (to help you find where to put the if() command I included the lines you're inserting it between, but only the if() part needs to actually be added, the rest is there already):

males[sample(1:n.males, n.parents, replace = T),] -> male.breeders
females[sample(1:n.females, n.parents, replace = T), ] -> female.breeders

if(immigration == TRUE) {
male.breeders[1,1] <- sample(alleles, 1)
male.breeders[1,2] <- sample(alleles, 1)
}

male.breeders[, sample(1:2,1)] -> offspring.a1
female.breeders[, sample(1:2,1)] -> offspring.a2

The if() command tests if we have set immigration to equal TRUE when we called the run.sim.final function, and if we did then we replace the alleles for the first male breeder with ones selected at random from the alleles vector. This is like randomly sampling a male from a large, outbred population in which the five alleles have the same frequencies.

3. Run the code chunk to get the function added to your environment, and then run it 100 times, using 50 of each sex (multiplier = 2) and with immigration (immigration = TRUE). Assign the output into an object called sim.output.n100.imm.

4. Now get the summaries you need, and make the graphs to see how immigration affects heterozygosity, alleleic diversity, and fixation rates.

Assignment

And with that, you're done with the simulations for the genetic drift project. Now you can write up a report of your results according to the instructions posted on Canvas, and will turn in all four of your simulations (n100, n100_time_to_fixation, n200, and n100_immigration) along with a Word file with your written report.

Genetic drift simulation in R - steps 3 and 4

Consider how the model is working

First variation: Double the population size

Second variation: immigration

Assignment