Experiments that test for the effect of more than one treatment at once are more efficient, compared with testing single treatments at a time. Experiments with more than one predictor are also the only way that we can see how the effects of one treatment can be changed when combined with another. Factorial designs look just like the randomized complete block designs we encountered earlier - every combination of factor levels for the two treatments are used, with equal numbers of observations in each group. With this design it is possible to assess the effects of each of the treatments independently, because their effects are orthogonal to one another.

When we analyze a factorial experiment, we make a distinction between the main effects of the predictors and interactions between them. Main effects are measured using marginal means. Marginal means are simply the means for levels of just one of the treatments, ignoring the levels of the other. Interactions are measured based on the subgroup means, which are means for combinations of the treatment levels for both of the variables.

This distinction between main effects and interactions between variables is both one of the most powerful aspects of linear modeling, and one of the most easily misunderstood. A statistical interaction means that the effect one predictor has on the response depends on the level of the other predictor.

We would use models with interactions to test for biological effects such as:

Possible outcomes - main effects and interactions

To get an idea of what an interaction is testing for, consider the example from your book of an experiment on the amount of fat eaten by male or female rats. The fat fed to the rats was either fresh or rancid. Each combination of fat type (fresh, rancid) and sex of rat (male, female) was tested, with equal numbers of rats in each combination - that is, the design was complete and balanced.

Main effect of sex:

Main effect of fat type:

Interaction between sex and fat type:

ANOVA table

The graph on the right is an interaction plot, in which one of the predictor variables (sex) is on the x-axis, and the other predictor represented by separate lines (fat type). There are additional symbols on the graph that are not usually included in an interaction plot, but they are there to indicate the locations of marginal means for each sex (pink dot for the female marginal mean, baby blue dot for the male marginal mean - cheesy, I know, but it will help you remember what they are), and each fat type (the stars in the middle of the lines for each fat type). The symbols showing marginal means are used to calculate main effects - if the pink and blue dots are not at the same location along the y-axis there may be a main effect of sex, and if the stars are not at the same location along the y-axis there may be a main effect of fat type. The effects are expressed as differences between the relevant means - for example, the Main effect of sex of 35.5 says that the difference between the male and female marginal mean is 35.5 g.

Initially the graph shows the actual data. Before we change anything, let's interpret the actual observed results.

Now that you're oriented, let's start making some changes.

Feel free to play around with the numbers until you're able to tell by looking at the graph whether the main effects and interaction will be significant. Bear in mind that actual interaction plots will not include dots for the marginal mean, so use them as a learning tool, but realize that you'll need to make educated guesses about where the means will be for each of the levels of the first predictor along the x-axis, and across each of the lines representing the different levels of the second predictor when you're interpreting interaction plots (of course, the final word on whether effects are significant will be the p-values in the ANOVA table, but learning to read the graphs will help you understand the statistical results).

Since the observed results only show a significant effect of fat type the interpretation is very simple - the marginal means of rancid and fresh fat differ. But, since we now know that significant interactions complicate interpretation, we will need to wait for an example with a significant interaction to learn how to proceed. This will be the subject of this week's activity.

Instructions

Today you will conduct an analysis on some contrived data from a plant growth experiment. You are probably well aware that plants need sunlight and water to grow. Providing a plant with just one of these two requirements is not sufficient to keep it alive. Given this, if we were to conduct an experiment in which we varied water and sunlight as two predictor variables, the amount of growth we will get in the plants should be much greater when both sunlight and water are provided.

Our two predictors then are:

The response variable will be the height of the plants.

Start a new project in R Studio for this exercise. The Rmd file for you need is here, and the data you will use are in this file.

Import the data from interaction_examples.xls into a data set called "plant". If you view the data set, you'll see a column for the dependent variable (height), and one each for water and light. Every combination of light level and water level is present, with an equal number of replicates (3) used for each combination - thus, the experiment is complete and balanced, which makes the predictors orthogonal to one another.

Download

Choose water level to draw



1. Before you run the analysis, predict what the results will be. On the blank axes to the left, draw in the results you would expect from this experiment:

What of an interaction? If the lines cross, converge, or diverge as you move from the left to the right side of the graph that indicates that the amount of difference between the water levels depends on the light level, and you should expect an interaction between light and water.

Once you are done, click the "Download" button - the file you download should be called "graph.png", place it in your project folder (you should see it in your Files tab). Your Rmd file will add your drawing to your knitted output file automatically as long as it is in the same project folder as the Rmd file, and is named graph.png.

Now, on to the data analysis.

2. First, you should look at your data. We will use ggplot2 to make an interaction plot that uses two different grouping variables and a numeric response. This interaction plot will look much like the hand-drawn graph you just made - it will have means for each possible combination of light and water level, color coded by water level and with lines connecting means for a water level across the three light levels.

We will need means and standard errors, so use summarySE to get those, and assign them to an object called plant.sumstats (in your Rmd file, summarize.plant.data chunk - don't forget you need to load the Rmisc library to get the function):

plant.sumstats <- summarySE(plant, measurevar = "height", groupvars = c("water","light"))

If you look at the data you should see:

     water    light N   height       sd       se       ci
1 adequate   direct 3 16.10000 2.847806 1.644182 7.074343
2 adequate indirect 3 29.26667 3.847510 2.221361 9.557745
3 adequate     none 3  5.20000 1.352775 0.781025 3.360479
4     none   direct 3  5.50000 1.873499 1.081665 4.654031
5     none indirect 3  5.40000 2.163331 1.249000 5.374012
6     none     none 3  4.40000 2.971532 1.715615 7.381694

The two grouping variables you provided are the first two columns of plant.sumstats, and you can see that each row is a different combination of water and light.

Now you can graph the means for each combination of light (on the x-axis) and water (grouped and color-coded), with lines connecting the means from the same water level (Rmd file, graph.plant.data chunk):

ggplot(plant.sumstats, aes(x = light, y = height, color = water, group = water)) +

geom_point() +

geom_errorbar(aes(ymin = height - ci, ymax = height + ci), width = 0.1) +

geom_line()

We have been writing ggplot() commands all strung together in a line, but it works to hit a carriage return after each + so that the command is easier to read.

Based on the graph you should have some idea of what is going to happen when you fit the model - the lines are clearly not parallel, so we have reason to expect an interaction.

3. Fit the model. The complete model specification for a GLM with an interaction would be:

height ~ light + water + light*water

Main effects are indicated by the names of the predictors added together, and interactions are indicated as one predictor multiplied by another - this is literally how the interaction is represented in the GLM, as dummy-coded columns multiplied together (when both predictors are categorical), as a numeric column multiplied by dummy-coded columns (if one is categorical and the other is numeric), and as two numeric columns multiplied together (if both predictors are numeric). The asterisk indicates multiplication in R (and in most programming languages).

But, linear models are hierarchical, meaning that you can't include an interaction term without also including main effects for each of the factors. Because of this, to shorten and simplify the model specification R allows us to use just the highest-order interaction to specify the model - thus, all you need is (Rmd file, fit.lm chunk):

lm(height ~ light*water, data = plant) -> plant.lm

Including the light*water interaction in the model requires that main effects are also included, so it isn't necessary to write them in.

This model is balanced and complete, meaning that there are equal numbers of data points in each combination of light and water, and every combination is included in the data set. Because of this, light and water are orthogonal, and it won't matter if you use Type I or Type II sums of squares. You can confirm this is true by getting both a Type I SS ANOVA table (from anova()) and a Type II SS ANOVA table (from Anova(), in the car library). Get each and confirm that they give the same results (you can put these commands in the fit.lm chunk below your lm() command).

4. We definitely have a statistically significant interaction between light and water, so now what?

The significant interaction means that we can't interpret the main effects as some sort of general pattern - for example, if we just did Tukey post-hocs on light level, we would get a significant increase in height for indirect light compared to the other two light levels, but this would entirely be due to the high mean obtained when adequate water is provided - if you don't water the plants you don't get this positive effect of indirect light. Given that we are going to have to explain this contingent relationship between water and light, the main effects don't really add anything to our understanding of the biology of the system. To understand what's going on, we need to compare pairs of means for combinations of light and water.

But, which comparisons to make depends on what we really want to know. Our choice of post-hoc analysis will depending on the question we're interested in answering:

Question 1: what is the best combination of light and water?

Our graph shows that the combination of adequate water and indirect light gives us the tallest plants, but this combination may not be statistically significantly different from all of the other means. We can find this out by comparing the means for all of the combinations, using Tukey tests.

The simplest method of comparing all the combinations of factor levels to one another in R is to use emmeans() with tukey ~ light*water to define the comparisons to be made - this argument will compare heights for every combination of light and water. In the tukey.light.water chunk of the Rmd file enter:

library(emmeans)

emmeans(plant.lm, tukey ~ light*water) -> plant.emm

plant.emm

You will see this output:

$emmeans
 light    water    emmean   SE df lower.CL upper.CL
 direct   adequate   16.1 1.52 12    12.78    19.42
 indirect adequate   29.3 1.52 12    25.95    32.59
 none     adequate    5.2 1.52 12     1.88     8.52
 direct   none        5.5 1.52 12     2.18     8.82
 indirect none        5.4 1.52 12     2.08     8.72
 none     none        4.4 1.52 12     1.08     7.72

Confidence level used: 0.95

$contrasts
 contrast                            estimate   SE df t.ratio p.value
 direct adequate - indirect adequate    -13.2 2.15 12  -6.112  0.0006
 direct adequate - none adequate         10.9 2.15 12   5.060  0.0029
 direct adequate - direct none           10.6 2.15 12   4.921  0.0037
 direct adequate - indirect none         10.7 2.15 12   4.967  0.0034
 direct adequate - none none             11.7 2.15 12   5.431  0.0016
 indirect adequate - none adequate       24.1 2.15 12  11.172  <.0001
 indirect adequate - direct none         23.8 2.15 12  11.033  <.0001
 indirect adequate - indirect none       23.9 2.15 12  11.079  <.0001
 indirect adequate - none none           24.9 2.15 12  11.544  <.0001
 none adequate - direct none             -0.3 2.15 12  -0.139  1.0000
 none adequate - indirect none           -0.2 2.15 12  -0.093  1.0000
 none adequate - none none                0.8 2.15 12   0.371  0.9988
 direct none - indirect none              0.1 2.15 12   0.046  1.0000
 direct none - none none                  1.1 2.15 12   0.511  0.9947
 indirect none - none none                1.0 2.15 12   0.464  0.9966

P value adjustment: tukey method for comparing a family of 6 estimates

You can see that each row of the Tukey tests table is comparing one combination of light and water to another, and every combination is compared against every other. You can get a compact letter display by using the cld() function in the multcomp library - in the same code chunk, enter:

multcomp::cld(plant.emm)

and you will see that the following is added below the Tukey comparisons:

 light    water    emmean   SE df lower.CL upper.CL .group
 none     none        4.4 1.52 12     1.08     7.72  1   
 none     adequate    5.2 1.52 12     1.88     8.52  1   
 indirect none        5.4 1.52 12     2.08     8.72  1   
 direct   none        5.5 1.52 12     2.18     8.82  1   
 direct   adequate   16.1 1.52 12    12.78    19.42   2  
 indirect adequate   29.3 1.52 12    25.95    32.59    3 

Confidence level used: 0.95
P value adjustment: tukey method for comparing a family of 6 estimates
significance level used: alpha = 0.05
NOTE: Compact letter displays can be misleading
      because they show NON-findings rather than findings.
      Consider using 'pairs()', 'pwpp()', or 'pwpm()' instead.

You'll see the groups identified in the .group column. You can also see that the authors of emmeans disagree with the authors of multcomp about the wisdom of using compact letter displays!

This approach of comparing every combination of light and water is the most information rich, because every pair of means is tested. The problem is that with 2 water levels and 3 light levels there are six different combinations, and it takes 15 different comparisons (with 15 p-values) to test all of the means against each other. This is an issue because the p-value is adjusted for the number of comparisons, and comparing all of the means against one another reduces the power of each comparison. If you really want to compare all of the combinations, this is unavoidable, but it may be that you can answer all the questions you care about without doing all of these comparisons.

For example, if your question is...

Question 2: at each level of water, which light levels are different?

then you can reduce the number of comparisons that you make by more than half. If we only make comparisons between the three light levels that have adequate water, and the three light levels that have no water, we only have to test a total of six differences. Using this approach sacrifices comparisons between water levels, so we can't talk about the differences between water levels when we present our results, but if we already know that watering plants is important and are only interested in how water level and light level interact, then this may be a good choice. Fortunately, emmeans() provides a very simple way to do these comparisons (Rmd file, light.given.water.level chunk):

emmeans(plant.lm, tukey ~ light | water)

You'll see in the output window that you get lines that look like this:

$emmeans
water = adequate:
 light    emmean   SE df lower.CL upper.CL
 direct     16.1 1.52 12    12.78    19.42
 indirect   29.3 1.52 12    25.95    32.59
 none        5.2 1.52 12     1.88     8.52

water = none:
 light    emmean   SE df lower.CL upper.CL
 direct      5.5 1.52 12     2.18     8.82
 indirect    5.4 1.52 12     2.08     8.72
 none        4.4 1.52 12     1.08     7.72

Confidence level used: 0.95

$contrasts
water = adequate:
 contrast          estimate   SE df t.ratio p.value
 direct - indirect    -13.2 2.15 12  -6.112  0.0001
 direct - none         10.9 2.15 12   5.060  0.0008
 indirect - none       24.1 2.15 12  11.172  <.0001

water = none:
 contrast          estimate   SE df t.ratio p.value
 direct - indirect      0.1 2.15 12   0.046  0.9988
 direct - none          1.1 2.15 12   0.511  0.8676
 indirect - none        1.0 2.15 12   0.464  0.8891

P value adjustment: tukey method for comparing a family of 3 estimates

The output is divided into a set of comparisons for adequate water and another for no water, which makes it appear that the data are being split and analyzed separately, but this is not the case - the fact that the same standard error is being used for each comparison makes it clear that Tukey comparisons are being used, and is adjusting for all six of the comparisons.

The command is smart enough to work with either variable set as the conditional variable - try it using light as the conditional and water as the test variable and see what happens (Rmd file, water.given.light.level chunk).

Interactions vs. correlated predictors

There's a question in your Rmd file about the difference between interactions and correlated predictors - before you try to answer it read through this explanation of the issue. It's easier to see the point if we have an equal number of levels in the light and water data set, so the graphs show three levels of water - I added a "moderate" level, which is just halfway between the adequate and none water levels at each light level.

Interaction plot
3D plot, interaction

We set up a factorial design using all possible combinations of light level and water level, with equal numbers of replicates in each cell (i.e. each combination of light and water). The graphs to the left show the patterns of response in both a (now familiar) interaction plot, and as a three-dimensional bar chart.

The interaction plot shows the lack of parallel in responses (i.e. heights) for the combinations of the predictors, but the 3D plot shows the design of the experiment better - you can see that all the combinations of light and water were used (it is complete), and assuming the sample size is the same for all of the cells we can measure the effect of water independently of our measurement of the effect of light - the design is orthogonal.

Because we have an orthogonal design, we can see that the response to one predictor (e.g. light) depends on the level of the other (e.g. water) - if no light is provided then the effect of water is essentially zero, but if either direct or indirect light is used then the plants grow better with more water, the effect being greatest for indirect light.

Correlated predictors
Correlated predictors in 3D

But, what if instead of a nice balanced, complete design we had used only the combinations direct.adequate, indirect.moderate, and none.none? You can see on the 3D plot that this introduces a correlation between the predictors that is so severe that the predictors are completely confounded - each water level is matched with just one light level, so it's impossible to tell if the response is due to water alone, light alone, or an interaction between the two.

So, correlated predictors are a design-level problem that makes it difficult to tell how each predictor affects the response variable. Interactions measure the extent to which response to one variable depends on another, and we can't even assess whether interactions exist unless we have enough independent variation between the predictors to measure their joint effects.

That's it - knit and upload your Word file.