Guides: Analyze Data: R and RStudio: Factorial ANOVA

Introduction to factorial ANOVA

A factorial Analysis of Variance (ANOVA) is used to determine whether the means from two or more variables / factors with two or more levels each differ (e.g., main effects), and whether any of these variables and / or factors interact (e.g., interactions). If there are two factors, this is sometimes called a “two-way” ANOVA; if there are three factors, this is sometimes called a “three-way ANOVA” (et cetera).

There are three separate kinds of factorial ANOVA: fully between factorial ANOVA (where both or all factors have levels that are entirely between-group), fully within factorial ANOVA (where both or all factors have levels that are entirely within-group), and mixed factorial ANOVA (where one or more factors are entirely between-group AND one or more factors are entirely within-group).

Note that all three kinds of factorial ANOVA are parametric tests; if the assumptions of the test are not met, there is no non-parametric equivalent. You could consider transforming your dependent variable in some way and then running the parametric test (and all assumptions) again, though this does not guarantee normality.

Fully between factorial ANOVA

This is an extension of a one-way ANOVA, including two or more factors with two or more levels each that are fully “between” (i.e., each participant / sample is in only one condition). This test assumes that each participant / sample is in only one condition (independence of observations), that there should be no outliers, that your dependent variable is approximately normally distributed in each condition (normality), and that each condition has approximately equal variances (homogeneity).

This test requires a dataframe with one continuous column (DV), and one condition or grouping column (IV) per factor in the design. The data should be arranged in long format (for a visualization of long format data, see the one-way repeated measures ANOVA section of the LibGuide).

It is important before running any factorial ANOVA that the settings in R are set properly. In particular, we must set the contrasts. If this isn’t done, the results of any factorial ANOVA may be incorrect. To set the contrasts properly, run the line, options(contrasts = c(“contr.sum”, “contr.poly”)).

Testing assumptions

Normality

To determine whether the data being used in your ANOVA are normally distributed, run a Shapiro-Wilk test on each cell of the data. Cells are defined by the factors in your design. Each combination of the levels of the factors creates a cell. For example, if your design has one factor with two levels and one factor with four levels, their combinations create eight cells. The example data that this tutorial will have eight cells.

Blank cell		Level 1	Level 2	Level 3	Level 4
Blank cell		Factor 2
Factor 1	Level 1	1,1 (cell 1)	1,2 (cell 2)	1,3 (cell 3)	1,4 (cell 4)
Factor 1	Level 2	2,1 (cell 5)	2,2 (cell 6)	2,3 (cell 7)	2,4 (cell 8)

To test normality, you’ll need to first separate cell data into separate variables, and then run a Shapiro-Wilk test (shapiro.test()) on the dependent variable for each of these variables (i.e., on each group). A significant Shapiro-Wilk statistic (p < .05) for one or more of your groups indicates that you have failed the test of normality.

Since data in R is usually arranged in long format, with one column containing all dependent variable values and one column for each independent variable containing labels, we first need to separate the condition cells so that we can test normality on each. We can do this using the filter() function from the tidyverse package. When using the filter() function, first specify the dataframe that contains your independent and dependent variable data, then give a logical statement specifying which rows in the dataframe to keep. To build the logical statement, specify the column to you wish to filter based on (i.e., your grouping / independent variable column), followed by a double equals sign (==), and then indicate which content in that column to keep: filter(NAMEOFDATAFRAME, NAMEOFIVCOLUMN == “GROUP”). If you are filtering for more than one thing at a time, separate arguments in the filter() function with a comma. All arguments will have to be true in order for the rows to be filtered.

NOTE: the filter() function is case sensitive (i.e., a lowercase “a” is not the same as an uppercase “A”) and much match exactly in order to work.

NOTE: your colour words may appear highlighted in a given colour; this is a feature of certain versions of RStudio.

Now that the data have been separated, we can use these eight new dataframes with the shapiro.test() function to test normality of the dependent variable columns for each condition using: shapiro.test(NAMEOFDATAFRAME$NAMEOFDVCOLUMN).

NOTE: though only three tests are shown in the image above, the results of all eight tests are important because there are eight groups in the data.

Since all tests returned p > .05, we have satisfied the normality assumption and can continue with the factorial ANOVA.

Homogeneity of variance

The second assumption that we can test for in R is the assumption that variances do not differ between groups, also referred to as homogeneity of variance or homoscedasticity. This is typically tested using either a Levene’s test or a Bartlett test. Here, we walk through how to conduct a Levene’s test in R using the leveneTest() function from the car package, which makes it easy to handle more than one IV. Importantly, we are not going to load in the whole car package as it will interfere with some of the functions in tidyverse. Instead, to access only the leveneTest() function, we can use car:: to select a function within the car package without loading the entire package (NOTE: the package will need to be installed on your device in order for this to work). In the function, give the dependent and independent variable columns as a formula (DV ~ IV1 * IV2), with each independent variable separated by an asterisk, and specify the dataframe they are from: car::leveneTest(NAMEOFDVCOLUMN ~ NAMEOFIVCOLUMN1 * NAMEOFIVCOLUMN2, data = NAMEOFDATAFRAME). Note, we are using the dataframe that has all groups’ data for this test.

Since p > .05, we have satisfied the assumption.

How to run a fully between factorial ANOVA

You can use the aov() function to run a fully between factorial ANOVA. This function requires a formula specifying the dependent and independent variables (DV ~ IV1 * IV2) and the dataframe they come from: aov( NAMEOFDVCOLUMN ~ NAMEOFIVCOLUMN1 * NAMEOFIVCOLUMN2, data = NAMEOFDATAFRAME). Save this model into a variable using the <- operator, then use the summary() function to see the ANOVA table: summary(NAMEOFMODEL).

Interpreting the Output

Running the above steps will generate the following output:

Note that here there are four rows: one for the main effect of each IV (here, two), one for the interaction between IVs (here, one), and one for the residuals (also called the error), as well as many columns:

Df: This column gives the degrees of freedom for the effect; one for the IV (df1) and one for the residuals (df2). df1 = k-1, where k is the number of groups / levels for that factor; df2 = n*k, where n is the number of participants or observations per cell, and k is total number of cells. Note that df1 may vary based on which effect we are considering (i.e., if your factors have differing number of levels or groups).
Sum Sq: This column gives the sums of squares (SS) for the IVs and residuals.
Mean Sq: This column gives the mean square (MS) for the IVs and residuals.
F value: This column gives the test statistic (F) and its value.
Pr(>F): This column gives the p-value indicating whether the test is significant (p<.05), or, said another way, whether there is a difference somewhere between the levels of the factor(s).

This ANOVA shows a non-significant main effect of gender (p > .05), a non-significant main effect of colour, and a non-significant interaction between gender and colour. Importantly, when p > .05, we cannot say that there is “no difference" between the level(s); we can only say that we failed to find a difference. If p < .05, we could conclude that there is a difference somewhere between the levels, though we would need to follow this up with additional testing to determine which levels are significantly different from each other.

Effect size

The most appropriate effect size measure to use with factorial ANOVAs is partial eta-squared. An effect size can be calculated for each of the effects in the test with the formula: partial eta-squared= SS_Effect / (SS_Effect + SS_Residuals). All of these components can be found in the ANOVA table:

Effect row, Sum Sq column: SS_Effect
Residuals row, Sum Sq column: SS_Residuals

Post hoc tests

When running a factorial ANOVA, each effect that is tested requires its own set of post hoc tests IF the effect in the ANOVA was significant. It is recommended to first follow up on any significant interaction(s), because an interaction can create illusions with main effects. To explain, let’s consider the above example and pretend the interaction was significant. Depending on the pattern of the interaction, it may be that there is an effect of Colour on Fake_Data1 scores for women but not for men. The effect in women may be so larger that we observe a main effect of Colour, despite the effect only being present for one group. In this case, there isn’t a true main effect since it was not observed in both groups (men and women). Alternatively, If the effect of Colour goes in the opposite directions for men and women, this will result in a non-significant main effect of Colour, despite it significantly influencing scores in both groups. In this case, Colour might not seem to have an effect when really there is something happening there. The final possibility with a significant interaction, is that both groups show an effect of Colour in the same direction, but to different degrees. This last scenario will have a significant main effect that is not an illusion, but the size of which does depend on the other IV.

Interaction

When following up on a significant interaction, the goal is to determine whether there is an effect of IV1 at each level of IV2. To accomplish this, you must first separate your data based on IV2. We can do this in R using the filter() function. Below, we will test whether there is an effect of Colour at each level of Gender, so the first step is separating the data for each level of Gender. With the split data, we can test for an effect of Colour in men and women separately. Because there are more than two levels in Colour, we will run this as two ANOVAs (one for each Gender). If there were only two levels of Colour, we could run t-tests (expert tip: in a fully between factorial ANOVA, these should be independent sample t-tests). These tests will tell us about the simple main effects in the data.

The output of each test can be interpreted like a standard one-way ANOVA, and if significant, can be followed-up with pairwise tests, like Tukey’s HSD. See the one-way ANOVA section for more details.

Main effect

If you find a significant main effect for a factor that only has two levels, you can simply look at the descriptives to know which is higher / lower (as the main effect already tells you that these are different).

If your factorial ANOVA reveals a significant main effect for a factor that has more than two levels, you need to run pairwise comparisons to determine the location of the difference(s), as the main effect tells you there is a difference somewhere without specifying which groups might be different. Tukey’s HSD test is a common post hoc test that can compare every possible pair of groups in the design for differences. When following up on a main effect in a factorial design, we are interested in the differences between marginal means. In the case of the example, this means comparing the differences between the four Colour groups while ignoring Gender. Tukey’s HSD also adjusts p-values based on the number of post hoc comparisons being made to reduce the risk of Type I Error. To run Tukey’s HSD, use the TukeyHSD() function, give it the ANOVA model we made, and specify which effect we want to follow up on using the which argument: TukeyHSD(NAMEOFMODEL, which = “NAMEOFIVCOLUMN”).

The output from this test will show the mean differences between each group (diff), the lower (lwr) and upper (upr) confidence intervals on the mean differences, and the p-value for each comparison adjusted for Type I Error (p adj). In this example, all six tests are non-significant (all ps > .05), so we cannot conclude that there are differences between the groups.

Fully within factorial ANOVA

This is an extension of a repeated measures ANOVA, including two or more factors with two or more levels each that are fully “within” (i.e., each participant / sample is in each condition). This test assumes that there should be no outliers, that your dependent variable is approximately normally distributed in each condition (normality), and that the differences between all conditions must be approximately equal (sphericity).

This test requires a dataframe with one continuous column (DV), one condition / grouping column (IV) per factor in the design, and one identification column (ID). The ID column is used to match observations across conditions. For example, if the data measured participants, this column would label which observations belong to Participant 1, Participant 2, et cetera. There should be one value for each participant in each condition. Use glimpse(NAMEOFDATAFRAME) to see the structure of your data; the <dbl> label indicates a continuous variable and the <fct> label indicates a categorical variable. The DV must be a continuous variable and your IV an ID variables must be categorical variables. If your IV and ID columns aren’t factors (“<fct>”), use: NAMEOFDATAFRAME$NAMEOFCOLUMN <- as.factor(NAMEOFDATAFRAME$NAMEOFCOLUMN) to change them. Additionally, the data must be arranged in long format (for a visualization of long format data, see the one-way repeated measures ANOVA section of the LibGuide) so that it is compatible with the test functions.