Repeated measures in R

Repeated measures ANOVA

Medicinal leeches are aquatic annelids that feed on warm-blooded prey. They feed by attaching with their front suckers, abrading the skin with a rasp-like jaw in the middle of the sucker, and drinking the blood as it seeps out. Leeches are capable of consuming a huge amount of blood in a single feeding, several times their own body weights - you can see the difference in size between a hungry leech and a fed leech in the picture to the left.

After a feeding leeches rapidly expel the liquid portion of the blood, and then more slowly digest the solid components over time. As ectothermic animals, their metabolic rates are quite low, and a single blood meal can last them for months.

Leeches can also be fooled into eating arginine solution, but arginine has little nutritive value for leeches. We will see whether leeches fed on arginine show a different pattern of weight change over time compared to leeches fed on blood.

Multiple measurements from the same leech are probably not independent - a big leech is going to have consistently big masses compared with a small leech. We could handle this level of dependency by using the individual leech as a random effect.

However, we expect these animals to lose weight over the course of the experiment, which would probably introduce some serial dependency - that is, we expect that measurements of the same leech between successive days will be correlated, because the the weight of a leech on a given day will be affected by their weight on the day before. We can account for this serial dependency using repeated measures ANOVA (RMA).

Repeated measures analysis treats time as an ordinal categorical variable, so our analysis will be very much like a 2-way factorial ANOVA. Thus, we can ask questions about:

Whether there are differences in masses between arginine and blood-fed leeches - this is the between-subjects main effect.
Whether masses changed over time - this is the within-subjects main effect.
Whether the pattern of change over time differed between arginine and blood-fed leeches - this is the interaction of between- and within-subjects effects.

There are two different approaches to testing these hypotheses in a repeated measures analysis - the univariate approach, and the multivariate approach. If your data meet the assumptions of univariate RMA it is more statistically powerful than the multivariate approach, and is the better one to use. However, to use the univariate approach we need to assume sphericity, which means that the variance has to be the same between successive time points. If we violate sphericity, but not too badly, we can still use the univariate approach, but with an adjustment to the p-values. However, if we violate sphericity badly we would need to use the multivariate approach, which does not assume sphericity at all.

Importing the data and graphing the means

1. Start a new project (in a folder called "repeated"), start a new R script, and import today's data. Download this file and import the two leech worksheets - the first one (leech) has the data in the arrangement we need for running a repeated measures ANOVA, but the other (leeches_stacked) is better for graphing. Call the first data set leech, and the second leech.stacked.

The weights are expressed as multiples of the initial weight of each animal, obtained by dividing each weight by the original weight measured before feeding for the animal. A value of 7 indicates that the leech is 7 times as heavy as it was before feeding, and a value of 1 indicates that the animal is the same weight as it was before feeding.

2. Make a plot of means. Make a plot of mean weights, with "food" and "day" as the factors, and "weight" as the response. First, load ggplot2:

library(ggplot2)

Make the summarized data set you need for plotting - load the Rmisc library, and then use summarySE() to get the values you need:

leech.summ <- summarySE(leech.stacked, measurevar = "weight", groups = c("day","food"))

Next, set the global settings with:

leech.gg <- ggplot(leech.summ, aes(x = day, y = mean, color = food, group = food, ymin = mean - se, ymax = mean + se))

Then add points, error bars, and lines to the plot:

leech.gg + geom_point() + geom_line() + geom_errorbar(width = 0.1) + labs(y = "Weight")

The final labs() statement changes the y-axis label from "mean" to "Weight".

As expected, the leeches increase their body weight by a factor of 7-10 when they feed, but you should also see that the blood-fed leeches only lose weight rapidly on the first day or two as they expel liquid from the blood, and then lose weight much more slowly after that as they slowly digest the solid components. The arginine-fed leeches appear to gain less weight during feeding, and lose weight rapidly for several days in a row. Since the patterns of change over time are quite different you should expect a food x day interaction.

Repeated measures analysis in R

There are actually four different ways to run a univariate repeated measures analysis in R (that I know of, there may be more). They are:

Using aov() with an Error() statement.
Using Anova() from the car package.
Using ezANOVA() from the ez package.
Using linear mixed effects models, from either the lme or lme4 packages.

We will use the Anova() function from the car package, because it is flexible and comprehensive. Anova() gives us a test of sphericity, and provides univariate analysis with and without corrections for violation of sphericity, as well as multivariate tests. With the output from Anova() we will have the output we need, whether we meet the sphericity assumption or not.

The three packages that we aren't going to learn about today are also interesting. We could use aov(), which is part of base R and doesn't require any libraries to be installed, but it doesn't provide the test of sphericity we need. The ez package is designed as a set of "wrappers" that use existing R commands to do the actual work, but that try to simplify the input required of the user. The linear mixed effects packages (lme and lme4) are the preferred platforms for fitting complex mixed-effects models in R, including repeated measures analysis. If you're trying to develop your R skills to their fullest, you may want to try out these different packages - there are lots of tutorials available with a little Googling.

There are several steps we need to take to use the Anova() function for repeated measures analysis.

1. Prepare the data for analysis. To use Anova() we need to provide it with several things:

An object that defines the within-subjects design - it's possible to have treatment levels that we need to account for in the within-subjects part of the design. We could, for example, have kept the leeches in cold water for the first four days, and then increased the temperature for the last three, in which case temperature would be an additional within-subjects factor. In this case, since we have only one within-subjects variable (day) we have a very simple within-subjects design - all we need to do is to make a vector that lists the time points, with no additional grouping information. We will call this object "time.factor".
A data frame that gives names for the different repeated time points (which we will call "time.frame").
A fitted lm() model that includes the between-subjects effect for each of the time points (which we will call "time.model"). This is actually a multivariate linear model, with all of the time points treated as a response, but Anova() will use the analysis to construct a univariate RMA from it.

To set up the object that defines the different levels of the repeated factor, use:

time.factor <- ordered(seq(1,7))

This command generates a sequence of numbers from 1 to 7 and then makes an ordinal factor out of them. We have seven days of measurements, and Anova() will use this object to label them. You won't see any output to the screen because the factor was assigned to time.factor, but if you type the name of the object its contents will be displayed:

[1] 1 2 3 4 5 6 7
Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7

Next we need a data frame with the names of the time points, which we can get by converting time.factor into a data frame:

time.frame <- data.frame(time.factor)

Time.frame has only the time.factor in it, but it is now a column in a data frame. The time.frame object is defining the within-subjects design, and since we only have a single within-subjects factor (time), we just need one column with the time points identified.

Next, we need to extract the columns from leech that have the mass measurements in them. Use the commands:

time.bind <- as.matrix(leech[,3:9])

This command extracts the masses in columns 3 through 9 and assigns them to a matrix called time.bind.

Finally, we use the lm() command to set up a fitted model - this is not yet a repeated measures analysis, but the Anova() command will use it to produce a repeated measures analysis in the next step. To fit the model, use the command:

time.model <- lm(time.bind ~ leech$food)

The time.model object is actually a multivariate linear model for the between-subjects factor (food), that also includes all of the basic quantities Anova() needs to do univariate repeated measures ANOVA.

3. Run the analysis. First, make sure the "car" library is loaded - it has the Anova() function we need:

library(car)

Generate the analysis with the commands:

leech.rma <- Anova(time.model, idata = time.frame, idesign = ~time.factor)

summary(leech.rma)

The first command, Anova(), produces the analysis and places the output into object leech.rma, and the second, summary(), dumps the output to the screen so you can see it.

4. Interpret the tests of main effects and interactions. The Anova() command produces a lot of output, because both multivariate and univariate tests are produced, as well as the sphericity tests we need in order to decide which output to interpret. To find the univariate tests, scroll down until you find "Univariate Type II Repeated-Measures ANOVA Assuming Sphericity". It looks like this:

Univariate Type II Repeated-Measures ANOVA Assuming Sphericity

                           SS num Df Error SS den Df       F    Pr(>F)
(Intercept)            5435.6      1 160.520     18 609.517 2.470e-15 ***
leech$food              545.5      1 160.520     18 61.165 3.379e-07 ***
time.factor             407.1      6   35.649    108 205.557 < 2.2e-16 ***
leech$food:time.factor   72.2      6   35.649    108 36.445 < 2.2e-16 ***

Main effect of food type: I've put the test of the main effect of food in red type - you can see that it is significant. The organization of this table is different from what you're used to - instead of having sub-tables with different error terms Anova() puts both the numerator and denominator for each test into a single row of the table. For the test of food type this means that:

The test of the main effect of food type is labeled leech$food.
The "SS" column is the sums of squares for food type, and "num DF" is the degrees of freedom for this between-subjects factor (with 2 food types we have 1 df).
The denominator for the test of differences in food types is variation in average masses for each leech. The "Error SS" and "den DF" refer to variation between leech averages, and number of leeches minus 2, respectively.
The "F" column is the ratio of (SS/num Df)/(Error SS/den Df) for the row.

Since the main effect of food is tested using the averages for each leech, and doesn't use the repeated measures themselves, the sphericity assumption doesn't pertain to the test of this between-subjects effect.

However, before we can interpret the main effect of food we need to look at the food x time interaction to make sure any general statements we make about effects of food are true at every time point. The table that Anova() produces doesn't split the test of the main effect of food from the test of the main effect of time or the time x food interaction, but if you look at the den DF column you'll see that the denominator degrees of freedom for both time.factor and the leech$food by time.factor interaction lines is 108, whereas it was 18 for the main effect of food. The time and food x time effects use an error term that is based on the repeated measurements, and thus we have to worry about the sphericity assumption for those two effects - even though they are all plopped together in the same table, we can't interpret the main effect of time or the food x time interaction until we have dealt with the sphericity assumption.

Main effect of time, and the food by time interaction - the procedure:

How we test for the main effects of time and the food by time interaction depends on whether we meet the sphericity assumption of univariate repeated measures ANOVA. If we fail the Mauchly sphericity test, our next step also depends on how badly we are violating the assumption. To summarize, the procedure is:

Mauchly test results	Epsilon	Results to use
Passed - sphericity assumed	Any	Univariate Type II tests
Failed - sphericity not assumed	Greater than 0.7	HF or GG-adjusted p-values
	Less than 0.7	Multivariate tests

What does the Mauchly test tell us? Below the ANOVA table is Mauchly's test of sphericity - it looks like this:

Mauchly Tests for Sphericity

                       Test statistic    p-value
time.factor                0.00015747 3.1978e-19
leech$food:time.factor     0.00015747 3.1978e-19

Like all of our other assumption tests, the null hypothesis for Mauchly is that we meet the assumption. With a p-value less than 0.05 we fail the test.

The next block of output gives the epsilon estimates using two different estimation methods, the Greenhouse-Geisser method and the Huynh-Feldt method.

Greenhouse-Geisser and Huynh-Feldt Corrections
for Departure from Sphericity

                        GG eps Pr(>F[GG])
time.factor            0.37125 < 2.2e-16 ***
leech$food:time.factor 0.37125 3.299e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

                          HF eps   Pr(>F[HF])
time.factor            0.4260826 1.753500e-25
leech$food:time.factor 0.4260826 2.097151e-11

The two different epsilon estimates are different from one another, but both are below 0.7. If epsilon was 0.7 or bigger we would use one of the two adjusted p-values presented as our test of time and time x food interaction, instead of using p-values from they Type II ANOVA table above. The corrected p-values will generally be bigger than those in the Type II table - adjusting for a lack of sphericity reduces our power somewhat.

If you were able to use these corrected p-values, which should you use? The general advice is to use GG, particularly if epsilon is 0.75 or less. They tend to be very similar to one another when epsilon is bigger than 0.75, so the decision is not a very important one - using GG routinely is fine.

In our case we violated sphericity with an epsilon value that is too small (< 0.7) to use the GG or HF corrected p-values, so we should use the multivariate tests instead - the multivariate approach to testing repeated measures doesn't assume sphericity at all, so we can use it no matter how small the epsilon value is.

If you search through the mass of output, you'll find a table for each term in the model that looks like this (they aren't all together, but I've gathered them together to make it easier for you to see them):

Multivariate Tests: (Intercept)
                 Df test stat approx F num Df den Df     Pr(>F)
Pillai            1   0.97132 609.5174      1     18 2.4705e-15 ***
Wilks             1   0.02868 609.5174      1     18 2.4705e-15 ***
Hotelling-Lawley 1 33.86208 609.5174      1     18 2.4705e-15 ***
Roy               1 33.86208 609.5174      1     18 2.4705e-15 ***

Multivariate Tests: leech$food
                 Df test stat approx F num Df den Df     Pr(>F)
Pillai            1 0.772626 61.16487      1     18 3.3788e-07 ***
Wilks             1 0.227374 61.16487      1     18 3.3788e-07 ***
Hotelling-Lawley 1 3.398048 61.16487      1     18 3.3788e-07 ***
Roy               1 3.398048 61.16487      1     18 3.3788e-07 ***

Multivariate Tests: time.factor
                 Df test stat approx F num Df den Df     Pr(>F)
Pillai            1   0.99258 289.8905      6     13 4.5181e-13 ***
Wilks             1   0.00742 289.8905      6     13 4.5181e-13 ***
Hotelling-Lawley 1 133.79560 289.8905      6     13 4.5181e-13 ***
Roy               1 133.79560 289.8905      6     13 4.5181e-13 ***

Multivariate Tests: leech$food:time.factor
                 Df test stat approx F num Df den Df     Pr(>F)
Pillai            1 0.965179 60.05718      6     13 9.9717e-09 ***
Wilks             1 0.034821 60.05718      6     13 9.9717e-09 ***
Hotelling-Lawley 1 27.718697 60.05718      6     13 9.9717e-09 ***
Roy               1 27.718697 60.05718      6     13 9.9717e-09 ***

The term that is being tested is identified after the "Multivariate Tests:" label - there is a test of the intercept, of the main effect of food, of the main effect of time, and of the time x food interaction. For each of the terms being tested there are rows labeled Pillai, Wilks, Hotelling-Lawley, and Roy - each of these is a different multivariate test statistic. For each one it's possible to calculate an approximate F test statistic, which is used to calculate the p-value for the test - all of the approximate F values are the same across all four multivariate test statistics, so you would only need to present one multivariate test for each term. Wilks is most commonly reported, so report that row for each test when you fill in your worksheet.

I told you earlier that you don't need to worry about the sphericity assumption with respect to the main effect of the between-subjects factor, food, yet you get a multivariate test for this term. If you look at the red text in the multivariate test of the food main effect you'll see that the results are identical to the univariate tests - same numerator and denominator degrees of freedom, approximate F-value equal to the univariate test's F-value, same p-value. If you need to use multivariate tests for the effect of time and food x time it's a good idea to use the multivariate results for food as well just to avoid unneeded complexity in explaining your methods, but the results are the same whichever one you pick.

Post-hoc analysis of an RMA

Post-hoc comparisons in a repeated measures ANOVA can take several different forms, depending on what we want to know about the results.

With no interaction of between and within-subjects factors

The simplest possibility would be a case in which there are main effects of the between-subjects factor (food) and/or the withing-subjects factor (time) but no interaction. In that case we can do the following:

With an interaction of between and within-subjects factors

We have an interaction between food and day, which tells us that the pattern of change over time depends on the treatment (or, equivalently, that the amount of difference between the treatments depends on the day). We still would like to avoid comparing all possible pairs of means, because with 7 time points and 2 foods we have 14 groups, which gives us 91 means to compare. Many of these comparisons are probably not very interesting to us, though - the comparison of the first day in the blood group to the last day in the arginine group is probably not all that experimentally important. A more restricted, targeted set of comparisons may be all that we need.

The choice of which comparisons to make depends largely on your research question, but there are a couple of common approaches.

Comparisons of between-subjects groups at each time point: If finding the time points at which the treatments differ is important we can compare the food treatments at each day. Since no comparisons are made between different days we will only have 7 comparisons to do.

R actually makes this pretty easy - we can use aov() to conduct an ANOVA for all 7 of the daily mass measurements with a single command. Comparison of foods within a time point is a comparison between two independent groups, so paired comparisons aren't needed, and with only two food levels we can use the p-value on an ANOVA to tell if the treatments are different (no Tukey tests required unless there are 3 or more levels to compare). Use the command:

The time.bind object has the masses for each day, using it as the response variable tells R to run the analysis once for each of the columns in time.bind. This command will thus give you a separate ANOVA table for each day of masses in time.bind. Bear in mind that with 7 tests you should not consider anything greater than 0.05/7 = 0.0071 to be significant.

Comparison between time points within each group: Instead of comparing between the treatment groups, we might want to know which time points are different within each of the two food types.

Regardless of which we choose, we will to use paired t-tests for these comparisons instead of ANOVA's or two-sample t-tests because we're comparing two sets of measurements on the same individuals on different days.

We will use a process that will give us the p-values, degrees of freedom, and test statistics that what we want, but it will take a couple of steps.

1. Calculate the differences between sequential time points. The data are in an object called leech, with the leech at each day in columns 3 through 9. We can calculate differences between successive time points with:

Since the column references are all different by 1, this command will subtract column 4 from 3, column 5 from 4, and so on, and put the differences in time.diffs. We can make these differences a data frame, with the command:

If you look at time.diffs (by typing time.diffs at the command prompt) you'll see that the labels are carried over from the first set of columns - they are the labels for the first through the 6th day, rather than labels that indicate which two days are being compared. We can fix this by using both of the days being compared in the column labels:

colnames(time.diffs) <- paste(colnames(leech[,3:8]), colnames(leech[,4:9]), sep = ".")

The paste() command combines arguments together, using the separator identified with "sep" statement. We are using the column names for columns 3 to 8, and combining them in pairs with the column names in 4 to 9, using a period as the separator. This will help us keep track of what is being compared as we assemble our results.

2. Split the data by food type. We want to split the data in two by food group, which we can do with:

The split() command takes a data frame we want to split, and a factor that has the groups we want to split by, and makes a list with data sets for each of the groups. A list is another R data type, which has named elements that can hold other data types (a data frame, or a matrix, for example). If you type the name:

you will see you have two named sections, one for $blood, and one for $arg, with the differences for each. You can extract a named element by name using the syntax:

to get the arginine differences as a group. To pull out just the differences for d1.d2, you would change this to:

3. Conduct the paired t-tests. Now that we have a list with our two data sets, we can get our t-tests done using a single command that applies a function to each named element of a list. Use the command:

lapply(diffs.split, function(i) apply(i, MARGIN = 2, FUN = t.test)) -> diffs.ttests

This command actually uses two different versions of the apply function nested inside of one another to get all of the tests that we want done in one command. If you have done any programming, this command functions as a nested loop - the way that it works is:

We are going to take advantage of R's ability to apply functions repeatedly to lists, data frames, or matrices again later in the semester - it is a little confusing at first, but the apply() family of functions make it possible to do a lot of repetitive work with just a few commands.

you'll see that diffs.ttests is a list of t-tests, each labeled by the food type first, and then by the day. For example, the label $arg$d1.d2 is labeled first by the food type, and then by the days being compared. The labeling is hierarchical - meaning, if we used the command:

we get all of the t-tests done on the arginine-fed leeches, labeled by the days compared.

It's a long list, and it would take some patience and care to record each p-value, test statistic, and degrees of freedom for all of the tests. Fortunately, there is a better way.

First, you should understand that the output of a t-test is also an R object made up of several named elements. We can see what the names are for the arginine d1.d2 t-test by typing:

[1] "statistic" "parameter" "p.value" "conf.int" "estimate"
[6] "null.value" "alternative" "method" "data.name"

We don't need all of these values, we only need the test statistic, degrees of freedom, and p-value, which are the named elements "statistic", "parameter", and "p.value". To extract p-values from all of the tests in the diffs.ttests list, we would use:

The sapply() function is just like the lapply() that you used above, but it returns a matrix instead of a list. This command is actually two sapply() functions nested inside one another, and it works like this:

If you type pvals at the command line, you'll see a matrix of p-values, with row labels indicating the days being compared, and column labels indicating the food type.

You now have p-values in an object called "pvals", test statistics in an object called "tstats", and degrees of freedom in an object called "dfs". To combine all of these statistics into a single data frame, you can use:

The row labels show the comparison, and the column labels identify the statistic - R automatically appended arg or blood to the t, df, and p column names because data frames can't have two columns with identical names.

You can change the ordering of the columns if you want to put the arginine and blood stats together. First put the comparisons into an object:

This command is just the name of the ttest.results object, with square brackets used to refer to the rows and columns it contains. The first entry is the row identifier, and leaving it blank gives us all of the rows. The second entry is the column identifier, and using c(1,3,5,2,4,6) specifies the ordering of the columns to be reported.

If you want to challenge yourself... try modifying this procedure to compare days 2 through 7 to day 1. This is the kind of comparison you would do if your first measurement was of initial conditions, and you wanted to identify the point at which the masses because different from the starting point.

Random effects

The second half of this assignment deals with random effects. We will use data on the gestation lengths and masses of a selection of mammals from different taxonomic groups, which you will find in the worksheet called "mammal" in the Excel spreadsheet you downloaded - import the data into a data frame called "mammal".

If you open the file to view it, you will see that each row is a species of mammal, with columns indicating the taxonomic order, family, genus, and species (remember, a species name is a combination of its genus and a specific epithet - the genus column has the genus name, and the species column has the specific epithet). There is also a column for the mass (in grams) and the gestation length (in months) for each species. The natural log of mass and gestation length are in the columns loggest and logmass - both variables tend to be strongly right-skewed, and log transformation improves normality.

All taxonomic levels group organisms together that are related, but higher levels in the taxonomic hierarchy encompass more distantly related organisms than lower levels. All of the species in our data set are in the class Mammalia - they are all mammals. The first level that is recorded in the data set is order - the orders in the data set are:

We expect the species within an order to be more similar to one another than they are to species in other orders, but since order is a fairly high level in the hierarchy the species within an order can be fairly different from one another.

The next level below order is family, and the data set we are working with has two families within each order:

We would expect less variation between species that are members of the same family than between species that are members of the same order, because species within the same family are more closely related on average than those in the same order. The same basic expectation would hold for members of the same genus - there are three genera for each family and two species per genus in this data set. The two species in the same genus are quite closely related, and should be fairly similar in many ways (size, physiology, genetic composition, etc.).

To make the analysis simple, I selected an equal number of families within each order, genera within each family, and species within each genus. As such, we are working with a sample of the possible members of each taxonomic level, and we will thus treat each level as a random effect - the purpose of the analysis will not be to test for differences between these particular orders, families, and genera, but to see how much variation there is between random samples of orders, families, and genera of mammals. Once we have our ANOVA table, we will estimate the variance components for each level, so we can see if our expectation of decreasing amounts of variation as we go from order through genus is upheld.

aov(loggest ~ Error(order + family + Genus), data = mammal) -> mammal.loggest.aov

You will get a scary red warning message about the Error() term being singular - don't worry, this has to do with the fact that we do not have a crossed (factorial) design, and there are many combinations of the three different levels that don't occur. In spite of this warning message the calculations are done correctly for a nested ANOVA.

2. Use the summary() command to get your ANOVA table - it should look like this for log gestation:

Error: order
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 3 21.42    7.14

Error: family
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 4 2.668   0.667

Error: Genus
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 16 1.104 0.06902

Error: Within
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 24 0.2109 0.008789

3. Calculate the variance components for each level - Order, Family, Genus, and Species. The expected mean squares for each level are:

Random effect	Expected mean squares	Estimated by:
Order	nmsσ²_order + nmσ²_family + nσ²_genus + σ²_error	Residual for Error: order
Family	nmσ²_family + nσ²_genus + σ²_error	Residual for Error: family
Genus	nσ²_genus + σ²_error	Residual for Error: Genus
Species	σ²_error	Residual for Error: Within

To calculate the variance components for each taxonomic level you need to isolate its σ² - for example, to isolate σ²_order you would need to subtract MS_family from MS_order, and then divide by the number of data points in the mean for each order. With n = 2 species per genus, m = 3 genera per family, and s = 2 families per order, there are 12 species used to calculate the mean for each order. The calculation would thus be:

Since each row of the table is a different species the σ²_error term is also σ²_species. The σ²_error component is estimated directly by MS_error, which is equal to 0.008789, with no algebra needed!

That's it! Answer the questions on the assignment sheet, and upload your R output in a Word file to the class web site.

Repeated measures and random effects