Skip to Main Content

Analyze Data: R and RStudio

Contributors: Lindsay Plater, Michelle Dollois and Angelica Nascimento de Oliveira

Introduction to factorial ANOVA

A factorial Analysis of Variance (ANOVA) is used to determine whether the means from two or more variables / factors with two or more levels each differ (e.g., main effects), and whether any of these variables and / or factors interact (e.g., interactions). If there are two factors, this is sometimes called a “two-way” ANOVA; if there are three factors, this is sometimes called a “three-way ANOVA” (et cetera).

There are three separate kinds of factorial ANOVA: fully between factorial ANOVA (where both or all factors have levels that are entirely between-group), fully within factorial ANOVA (where both or all factors have levels that are entirely within-group), and mixed factorial ANOVA (where one or more factors are entirely between-group AND one or more factors are entirely within-group).

Note that all three kinds of factorial ANOVA are parametric tests; if the assumptions of the test are not met, there is no non-parametric equivalent. You could consider transforming your dependent variable in some way and then running the parametric test (and all assumptions) again, though this does not guarantee normality.

Fully between factorial ANOVA

This is an extension of a one-way ANOVA, including two or more factors with two or more levels each that are fully “between” (i.e., each participant / sample is in only one condition). This test assumes that each participant / sample is in only one condition (independence of observations), that there should be no outliers, that your dependent variable is approximately normally distributed in each condition (normality), and that each condition has approximately equal variances (homogeneity). 

This test requires a dataframe with one continuous column (DV), and one condition or grouping column (IV) per factor in the design. The data should be arranged in long format (for a visualization of long format data, see the one-way repeated measures ANOVA section of the LibGuide). 

It is important before running any factorial ANOVA that the settings in R are set properly. In particular, we must set the contrasts. If this isn’t done, the results of any factorial ANOVA may be incorrect. To set the contrasts properly, run the line, options(contrasts = c(“contr.sum”, “contr.poly”)).

The RStudio interface, with the options() function in the source window to set the contrasts, the glimpse() function in the source window to check the data types, and several variables displayed in the console.

Testing assumptions

Normality

To determine whether the data being used in your ANOVA are normally distributed, run a Shapiro-Wilk test on each cell of the data. Cells are defined by the factors in your design. Each combination of the levels of the factors creates a cell. For example, if your design has one factor with two levels and one factor with four levels, their combinations create eight cells. The example data that this tutorial will have eight cells.

Blank cell Factor 2
Level 1 Level 2 Level 3 Level 4
Factor 1 Level 1 1,1 (cell 1) 1,2 (cell 2) 1,3 (cell 3) 1,4 (cell 4)
Level 2 2,1 (cell 5) 2,2 (cell 6) 2,3 (cell 7) 2,4 (cell 8)

To test normality, you’ll need to first separate cell data into separate variables, and then run a Shapiro-Wilk test (shapiro.test()) on the dependent variable for each of these variables (i.e., on each group). A significant Shapiro-Wilk statistic (p < .05) for one or more of your groups indicates that you have failed the test of normality.

Since data in R is usually arranged in long format, with one column containing all dependent variable values and one column for each independent variable containing labels, we first need to separate the condition cells so that we can test normality on each. We can do this using the filter() function from the tidyverse package. When using the filter() function, first specify the dataframe that contains your independent and dependent variable data, then give a logical statement specifying which rows in the dataframe to keep. To build the logical statement, specify the column to you wish to filter based on (i.e., your grouping / independent variable column), followed by a double equals sign (==), and then indicate which content in that column to keep: filter(NAMEOFDATAFRAME, NAMEOFIVCOLUMN == “GROUP”). If you are filtering for more than one thing at a time, separate arguments in the filter() function with a comma. All arguments will have to be true in order for the rows to be filtered. 

NOTE: the filter() function is case sensitive (i.e., a lowercase “a” is not the same as an uppercase “A”) and much match exactly in order to work.

The RStudio interface, with the filter() function in the source window being used to filter by the grouping variables "Gender" and "Colour".

NOTE: your colour words may appear highlighted in a given colour; this is a feature of certain versions of RStudio.

Now that the data have been separated, we can use these eight new dataframes with the shapiro.test() function to test normality of the dependent variable columns for each condition using: shapiro.test(NAMEOFDATAFRAME$NAMEOFDVCOLUMN).

The RStudio interface, with code for a Shapiro-Wilk normality test in the source window. The results of three of the eight tests are shown (to pass normality, p > .05).

NOTE: though only three tests are shown in the image above, the results of all eight tests are important because there are eight groups in the data.

Since all tests returned p > .05, we have satisfied the normality assumption and can continue with the factorial ANOVA. 

Homogeneity of variance

The second assumption that we can test for in R is the assumption that variances do not differ between groups, also referred to as homogeneity of variance or homoscedasticity. This is typically tested using either a Levene’s test or a Bartlett test. Here, we walk through how to conduct a Levene’s test in R using the leveneTest() function from the car package, which makes it easy to handle more than one IV. Importantly, we are not going to load in the whole car package as it will interfere with some of the functions in tidyverse. Instead, to access only the leveneTest() function, we can use car:: to select a function within the car package without loading the entire package (NOTE: the package will need to be installed on your device in order for this to work). In the function, give the dependent and independent variable columns as a formula (DV ~ IV1 * IV2), with each independent variable separated by an asterisk, and specify the dataframe they are from: car::leveneTest(NAMEOFDVCOLUMN ~ NAMEOFIVCOLUMN1 * NAMEOFIVCOLUMN2, data = NAMEOFDATAFRAME). Note, we are using the dataframe that has all groups’ data for this test. 

The RStudio interface, with code for a Levene's homogeneity test in the source window. The result of this test passes homogeneity (to pass, p > .05).

Since p > .05, we have satisfied the assumption.

How to run a fully between factorial ANOVA

You can use the aov() function to run a fully between factorial ANOVA. This function requires a formula specifying the dependent and independent variables (DV ~ IV1 * IV2) and the dataframe they come from: aov( NAMEOFDVCOLUMN ~ NAMEOFIVCOLUMN1 * NAMEOFIVCOLUMN2, data = NAMEOFDATAFRAME). Save this model into a variable using the <- operator, then use the summary() function to see the ANOVA table: summary(NAMEOFMODEL).

The RStudio interface, with code for a fully between factorial ANOVA in the source window. The results of ANOVA in the console show a nonsignificant main effect of Gender, a non-significant main effect of Colour, and a non-significant interaction of Gender*Colour.

Interpreting the Output

Running the above steps will generate the following output:

A close-up of the fully between factorial ANOVA output, showing a nonsignificant main effect of Gender, a non-significant main effect of Colour, and a non-significant interaction of Gender*Colour.

Note that here there are four rows: one for the main effect of each IV (here, two), one for the interaction between IVs (here, one), and one for the residuals (also called the error), as well as many columns:

  • Df: This column gives the degrees of freedom for the effect; one for the IV (df1) and one for the residuals (df2). df1 = k-1, where k is the number of groups / levels for that factor; df2 = n*k, where n is the number of participants or observations per cell, and k is total number of cells. Note that df1 may vary based on which effect we are considering (i.e., if your factors have differing number of levels or groups).
  • Sum Sq: This column gives the sums of squares (SS) for the IVs and residuals.
  • Mean Sq: This column gives the mean square (MS) for the IVs and residuals.
  • F value: This column gives the test statistic (F) and its value.
  • Pr(>F): This column gives the p-value indicating whether the test is significant (p<.05), or, said another way, whether there is a difference somewhere between the levels of the factor(s).

This ANOVA shows a non-significant main effect of gender (p > .05), a non-significant main effect of colour, and a non-significant interaction between gender and colour. Importantly, when p > .05, we cannot say that there is “no difference" between the level(s); we can only say that we failed to find a difference. If p < .05, we could conclude that there is a difference somewhere between the levels, though we would need to follow this up with additional testing to determine which levels are significantly different from each other.

Effect size

The most appropriate effect size measure to use with factorial ANOVAs is partial eta-squared. An effect size can be calculated for each of the effects in the test with the formula: partial eta-squared= SSEffect / (SSEffect + SSResiduals). All of these components can be found in the ANOVA table:

  • Effect row, Sum Sq column: SSEffect
  • Residuals row, Sum Sq column: SSResiduals

The RStudio interface, showing the calculation for partial-eta squared (an effect size) in the source window.

Post hoc tests

When running a factorial ANOVA, each effect that is tested requires its own set of post hoc tests IF the effect in the ANOVA was significant. It is recommended to first follow up on any significant interaction(s), because an interaction can create illusions with main effects. To explain, let’s consider the above example and pretend the interaction was significant. Depending on the pattern of the interaction, it may be that there is an effect of Colour on Fake_Data1 scores for women but not for men. The effect in women may be so larger that we observe a main effect of Colour, despite the effect only being present for one group. In this case, there isn’t a true main effect since it was not observed in both groups (men and women). Alternatively, If the effect of Colour goes in the opposite directions for men and women, this will result in a non-significant main effect of Colour, despite it significantly influencing scores in both groups. In this case, Colour might not seem to have an effect when really there is something happening there. The final possibility with a significant interaction, is that both groups show an effect of Colour in the same direction, but to different degrees. This last scenario will have a significant main effect that is not an illusion, but the size of which does depend on the other IV.

Interaction

When following up on a significant interaction, the goal is to determine whether there is an effect of IV1 at each level of IV2. To accomplish this, you must first separate your data based on IV2. We can do this in R using the filter() function. Below, we will test whether there is an effect of Colour at each level of Gender, so the first step is separating the data for each level of Gender. With the split data, we can test for an effect of Colour in men and women separately. Because there are more than two levels in Colour, we will run this as two ANOVAs (one for each Gender). If there were only two levels of Colour, we could run t-tests (expert tip: in a fully between factorial ANOVA, these should be independent sample t-tests). These tests will tell us about the simple main effects in the data.

The RStudio interface, showing the post hoc analysis (here, a one-way ANOVA) for the fully between factorial ANOVA in the source window and the output in the console.

The output of each test can be interpreted like a standard one-way ANOVA, and if significant, can be followed-up with pairwise tests, like Tukey’s HSD. See the one-way ANOVA section for more details.

Main effect

If you find a significant main effect for a factor that only has two levels, you can simply look at the descriptives to know which is higher / lower (as the main effect already tells you that these are different).

If your factorial ANOVA reveals a significant main effect for a factor that has more than two levels, you need to run pairwise comparisons to determine the location of the difference(s), as the main effect tells you there is a difference somewhere without specifying which groups might be different. Tukey’s HSD test is a common post hoc test that can compare every possible pair of groups in the design for differences. When following up on a main effect in a factorial design, we are interested in the differences between marginal means. In the case of the example, this means comparing the differences between the four Colour groups while ignoring Gender. Tukey’s HSD also adjusts p-values based on the number of post hoc comparisons being made to reduce the risk of Type I Error. To run Tukey’s HSD, use the TukeyHSD() function, give it the ANOVA model we made, and specify which effect we want to follow up on using the which argument: TukeyHSD(NAMEOFMODEL, which = “NAMEOFIVCOLUMN”).
The RStudio interface, showing post-hoc TukeyHSD tests as a follow-up of a significant main effect in the fully between factorial ANOVA.

The output from this test will show the mean differences between each group (diff), the lower (lwr) and upper (upr) confidence intervals on the mean differences, and the p-value for each comparison adjusted for Type I Error (p adj). In this example, all six tests are non-significant (all ps > .05), so we cannot conclude that there are differences between the groups.

Fully within factorial ANOVA

This is an extension of a repeated measures ANOVA, including two or more factors with two or more levels each that are fully “within” (i.e., each participant / sample is in each condition). This test assumes that there should be no outliers, that your dependent variable is approximately normally distributed in each condition (normality), and that the differences between all conditions must be approximately equal (sphericity).

This test requires a dataframe with one continuous column (DV), one condition / grouping column (IV) per factor in the design, and one identification column (ID). The ID column is used to match observations across conditions. For example, if the data measured participants, this column would label which observations belong to Participant 1, Participant 2, et cetera. There should be one value for each participant in each condition. Use glimpse(NAMEOFDATAFRAME) to see the structure of your data; the <dbl> label indicates a continuous variable and the <fct> label indicates a categorical variable. The DV must be a continuous variable and your IV an ID variables must be categorical variables. If your IV and ID columns aren’t factors (“<fct>”), use: NAMEOFDATAFRAME$NAMEOFCOLUMN <- as.factor(NAMEOFDATAFRAME$NAMEOFCOLUMN) to change them. Additionally, the data must be arranged in long format (for a visualization of long format data, see the one-way repeated measures ANOVA section of the LibGuide) so that it is compatible with the test functions. 

It is important before running any factorial ANOVA that the settings in R are set properly. In particular, we must set the contrasts. If this isn’t done, the results of any factorial ANOVA may be incorrect. To set the contrasts properly, run the line, options(contrasts = c(“contr.sum”, “contr.poly”)).

The RStudio interface, with the options() function in the source window to set the contrasts, the glimpse() function in the source window to check the data types, and several variables displayed in the console.

Testing assumptions

Normality

To determine whether the data being used in your ANOVA are normally distributed, run a Shapiro-Wilk test on each cell of the data. Cells are defined by the factors in your design. Each combination of the levels of the factors creates a cell. For example, if your design has one factor with three levels and one factor with two levels, their combinations create six cells. The example data that this tutorial will have six cells.

Blank cell Factor 2
Level 1 Level 2 Level 3
Factor 1 Level 1 1,1 (cell 1) 1,2 (cell 2) 1,3 (cell 3)
Level 2 2,1 (cell 4) 2,2 (cell 5) 2,3 (cell 6)

To test normality, you’ll need to first separate cell data into separate variables, and then run a Shapiro-Wilk test (shapiro.test()) on the dependent variable for each of these variables (i.e., on each group). A significant Shapiro-Wilk statistic (p < .05) for one or more of your groups indicates that you have failed the test of normality.

We first need to separate the condition cells so that we can test normality on each. We can do this using the filter() function from the tidyverse package. When using the filter() function, first specify the dataframe that contains your independent and dependent variable data, then give a logical statement specifying which rows in the dataframe to keep. To build the logical statement, specify the column you wish to filter based on (i.e., your grouping / independent variable column), followed by a double equals sign (==), and then indicate which content in that column to keep, filter(NAMEOFDATAFRAME, NAMEOFIVCOLUMN == “GROUP”). If you are filtering for more than one thing at a time, separate arguments in the filter() function with a comma. All arguments will have to be true in order for the rows to be filtered. 

NOTE: the filter() function is case sensitive, a lowercase “a” is not the same as an uppercase “A”. Additionally, because the Time column in the data is a factor type, when matching its contents, the numbers must be written in quotations. 

The RStudio interface, with the filter() function in the source window being used to filter by the grouping variables "Condition" and "Time".

Now that the data have been separated, we can use these six new dataframes with the shapiro.test() function to test normality of the dependent variable columns for each condition using: shapiro.test(NAMEOFDATAFRAME$NAMEOFDVCOLUMN).

The RStudio interface, with code for a Shapiro-Wilk normality test in the source window. The results of two of the six tests are shown (to pass normality, p > .05).

NOTE: though the output of only three tests are shown in the image above, the results of all six tests are important because there are six conditions in the data.

Since all tests returned p > .05, we have satisfied the normality assumption and can continue with the factorial ANOVA.

Sphericity

The assumption of sphericity requires that the variance of differences between all “within” conditions are equal. This assumption is important when a repeated-measures factor has more than two levels, and we can test this when we run the ANOVA. When this assumption is violated, a correction can be applied to the test.

How to run a fully within factorial ANOVA

To run a fully within factorial ANOVA that also tests for violations of sphericity, we can use the ezANOVA() function from the ez package. The function can be specified as follows: ezANOVA(dv = NAMEOFDVCOLUMN, wid = NAMEOFIDCOLUMN, within = .(NAMEOFIVCOLUMN1, NAMEOFIVCOLUMN2), data = NAMEOFDATAFRAME, detailed = TRUE). Note that when giving more than one repeated-measures IV to the within argument, they are enclosed in parentheses with a period before the first bracket.  By setting detailed = TRUE, the output will include both the test for sphericity and corrections for if sphericity is violated.

NOTE: the output from the ezANOVA() function sometimes uses scientific notation.

The RStudio interface, with code for a fully within factorial ANOVA in the source window. The results of ANOVA in the console show a significant main effect of Condition, a significant main effect of Time, and a non-significant interaction of Condition*Time.

Interpreting the Output

Running the above steps will generate the following output:

A close-up of the fully within factorial ANOVA output, showing significant main effect of Condition, a significant main effect of Time, and a non-significant interaction of Condition*Time.

Note that the output is split into three sections: 1) ANOVA, 2) Mauchly’s Test for Sphericity, and 3) Sphericity Corrections

  1. The ANOVA section here has four rows: one row for the intercept of the model (which we ignore), one for each main effect (here, two rows), and one for each interaction (here, one row). Going across the columns of the output:
    • Effect: This column labels which effect is being described in each row. If a row includes “:”, that indicates an interaction between two or more factors.
    • DFn: This column gives the degrees of freedom for the factor (df1). df1 = k-1, where k is the number of levels or conditions in the factor.
    • DFd: This column gives the degrees of freedom for the residuals or error (df2). df2 = (n-1) * (a-1) * (b-1), where n is the number of participants or observations, a is the number of levels in the first factor, and b is the number of levels in the second factor. When only one factor is included in the effect, the formula will only include the relevant factor [ for example: (n-1) * (a-1) ].
    • SSn: This column gives the sums of squares (SS) for the effect.
    • SSd: This column gives the sums of squares (SS) for the residuals.
    • F: This column gives the test statistic (F) and its value.
    • p: This column gives the p-value indicating whether the test is significant (p<.05), or, said another way, whether there is a difference somewhere between the levels of the factor(s).
    • p<.05: This column indicates which tests have p < .05 with a star.
    • ges: This column gives the general eta squared value for the effect – this is a measure of effect size.
  2. Mauchly’s Test for Sphericity
    • Effect: This column labels which IV is being tested for sphericity violations. Only “within” variables with more than two levels will appear here, including interactions that include these variables.
    • W: This column gives the test statistic (W) for Mauchly’s test and its value.
    • p: This column gives the p-value for each test. If p < .05, the sphericity assumption has been violated and a correction should be applied to the relevant effect(s).
    • p<.05: This column indicates which tests have p < .05 with a star.
  3. Sphericity Corrections. NOTE: this section appears regardless of whether the data have violated the sphericity assumption. Corrections only need to be used when the p-value(s) in section 2) is significant. Two common corrections are given; the Greenhouse-Geisser (indicated by “GG”) and the Huynh-Feldt correction (indicated by “HF”).
    • Effect: This column labels which effect the correction is for.
    • GGe / HFe: Epsilon estimate. These are measures of the degree to which the assumption has been violated. If reporting a sphericity correction, the epsilon can be multiplied by the degrees of freedom in the ANOVA section to obtain the new, corrected, degrees of freedom.
    • p[GG] / p[HF]: The new p-value for the effect after corrections. If the correction is being applied, this is what should be reported.
    • p[GG]<.05 / p[HF]<.05: This column indicates which corrected tests have p < .05 with a star.

This output indicates that the sphericity assumption is violated for the time effect. Therefore, we can use the output in the ANOVA section for the condition effect and the time*condition interaction, but should use the adjusted p-value for the time effect. With this in mind, this ANOVA shows a significant effect of condition (p < .05, a significant effect of time after the correction (p < .05), and a non-significant interaction (p > .05). 

Importantly, when p > .05, we cannot say that there is “no difference" between the level(s); we can only say that we failed to find a difference. If p < .05, we could conclude that there is a difference somewhere between the levels, though we would need to follow this up with additional testing to determine which levels are significantly different from each other.

Effect size

The most appropriate effect size measure to use with factorial ANOVAs is partial eta-squared. An effect size can be calculated for each of the effects in the test with the formula: partial eta-squared = SSEffect / (SSEffect + SSResiduals). All of these components can be found in the ANOVA table:

  • SSn column: SSEffect
  • SSd column: SSResiduals

To help prevent human error, we can use code to extract these values out of the ANOVA output. If the ANOVA test has been stored in an object, we can access its contents by using $ANOVA to go into the ANOVA section, another $ operator to select the column, and square brackets [ ] to index the correct row. Notice that the rows in the output are numbered. For example: NAMEOFANOVA$ANOVA$NAMEOFCOLUMN[ROWNUMBER].

The RStudio interface, showing the calculation for partial-eta squared (an effect size) in the source window.

Post hoc tests

When running a factorial ANOVA, each effect that is tested requires its own set of post hoc tests IF the effect in the ANOVA was significant. It is recommended to first follow up on any significant interaction(s), because an interaction can create illusions with main effects. To explain, let’s consider the above example and pretend the interaction was significant. Depending on the pattern of the interaction, it may be that there is an effect of Time on Score in Condition A but not in Condition B. The effect in Condition A may be so larger that we observe a main effect of Time, despite the effect only being present for one condition. In this case, there isn’t a true main effect since it was not observed in both groups (Condition A and Condition B). Alternatively, If the effect of Time goes in the opposite directions in Condition A and Condition B, this will result in a non-significant main effect of Time, despite it significantly influencing scores in both conditions. In this case, the interaction will initially hide that Time matters. The final possibility with a significant interaction, is that both conditions show an effect of Time in the same direction, but to different degrees. This last scenario will have a significant main effect that is not an illusion, but the size of which does depend on the other IV.

Interaction

When following up on a significant interaction, the goal is to determine whether there is an effect of IV1 at each level of IV2. To accomplish this, you must first separate your data based on IV2. We can do this in R using the filter() function. Below we will test whether there is an effect of Time at each level of Condition, so we will start by separating the data for each level of Condition.

In our separated data we can now test for an effect of Time in Condition A and Condition B separately. Because there are more than two levels in Time, we will run repeated measures ANOVAs (one for each Condition). If there were only two levels of Time, we could run t-tests (expert tip: in a fully within factorial ANOVA, these should be paired sample t-tests). These tests will tell us about the simple main effects in the data.

The RStudio interface, showing the post hoc analysis (here, a repeated-measures ANOVA) for the fully within factorial ANOVA in the source window and the output in the console.

The output can be interpreted like any repeated-measures ANOVA, and if significant, can be followed-up with pairwise tests. See the one-way repeated-measures ANOVA section for more details.

Main Effect

If you find a significant main effect for a factor that only has two levels, you can simply look at the descriptives to know which is higher / lower (as the main effect already tells you that these are different).

If your factorial ANOVA reveals a significant main effect for a factor that has more than two levels, you need to run pairwise post hoc tests to determine the location of the difference(s), as the main effect tells you there is a difference somewhere without specifying which groups might be different. The first step in conducting repeated-measures pairwise post hoc tests, is to collapse across the variable you are not interested in. In our example, to better understand the main effect of Time, we would first collapse across Condition. To do this, we can use the group_by() function and the summarise() function, grouping by the ID column and the IV of interest, and calculating the mean of the DV column. This will give us each participant’s mean Score at each Time point.

The RStudio interface, showing how to use group_by() and summarise() functions to get the mean score for each participant at each time point.

Once the data have been collapsed, we can run pairwise t-tests on the Time conditions using the pairwise.t.test() function. This will test all possible combinations of the levels in the factor. To use this function, specify the DV and IV columns within the dataframe, indicate that these are paired comparisons because Time was a within factor, and apply a p-value correction to minimized risk of Type I Error. For example: pairwise.t.test(x = NAMEOFDATAFRAME$DVCOLUMN, g = NAMEOFDATAFRAME$IVCOLUMN, paired = TRUE, p.adjust.method = “bonf”). By setting p.adjust.method to “bonf”, the test will apply a Bonferroni correction, which will multiply each p-value by the number of comparisons being made.

The RStudio interface, showing post-hoc pairwise comparisons using paired t-tests as a follow-up of a significant main effect in the fully within factorial ANOVA.

The output of the pairwise.t.test() function gives the p-values for the series of t-tests and indicates the correction applied at the bottom. To read this output, use the row and column headers to identify which p-values belong to which comparisons. In this example, all three t-tests are significant (all ps < .05), so we can conclude that there are differences between all groups.

Mixed factorial ANOVA

A mixed factorial ANOVA (also sometimes called a split-plot design) is a combination of a one-way ANOVA and a repeated measures ANOVA, where at least one of the factors is fully “between” (i.e., each participant / sample is in only one condition for this factor) and one of the factors is fully “within” (i.e., each participant / sample is in each condition for this factor). This test assumes that there should be no outliers, that your dependent variable is approximately normally distributed in each condition (normality), that each condition has approximately equal variances (homogeneity), and that the differences between all conditions must be approximately equal (sphericity).

This test requires a dataframe with one continuous column (DV), one condition / grouping column (IV) per factor in the design, and one identification column (ID). The ID column is used to match observations across conditions. For example, if the data measured participants, this column would label which observations belong to Participant 1, Participant 2, et cetera. There should be one value for each participant in each within-subject condition. Use glimpse(NAMEOFDATAFRAME) to see the structure of your data; the <dbl> label indicates a continuous variable and the <fct> label indicate a categorical variable. The DV must be a continuous variable and your IV an ID variables must be categorical variables. If your IV and ID columns aren’t factors (“<fct>”), use: NAMEOFDATAFRAME$NAMEOFCOLUMN <- as.factor(NAMEOFDATAFRAME$NAMEOFCOLUMN) to change them. Additionally, the data must be arranged in long format (for a visualization of long format data, see the one-way repeated measures ANOVA section of the Guide) so that it is compatible with the test functions.

It is important before running any factorial ANOVA that the settings in R are set properly. In particular, we must set the contrasts. If this isn’t done, the results of any factorial ANOVA may be incorrect. To set the contrasts properly, run the line, options(contrasts = c(“contr.sum”, “contr.poly”)).

The RStudio interface, with the options() function in the source window to set the contrasts, the glimpse() function in the source window to check the data types, and several variables displayed in the console.

Testing assumptions

Normality

To determine whether the data being used in your ANOVA are normally distributed, run a Shapiro-Wilk test on each cell of the data. Cells are defined by the factors in your design. Each combination of the levels of the factors creates a cell. For example, if your design has one factor with three levels and one factor with two levels, their combinations create six cells. The example data that this tutorial will have six cells.

Blank cell Factor 2
Level 1 Level 2 Level 3
Factor 1 Level 1 1,1 (cell 1) 1,2 (cell 2) 1,3 (cell 3)
Level 2 2,1 (cell 4) 2,2 (cell 5) 2,3 (cell 6)

To test normality, you’ll need to first separate cell data into separate variables, and then run a Shapiro-Wilk test (shapiro.test()) on the dependent variable for each of these variables (i.e., on each group). A significant Shapiro-Wilk statistic (p < .05) for one or more of your groups indicates that you have failed the test of normality.

We first need to separate the condition cells so that we can test normality on each. We can do this using the filter() function from the tidyverse package. When using the filter() function, first specify the dataframe that contains your independent and dependent variable data, then give a logical statement specifying which rows in the dataframe to keep. To build the logical statement, specify the column to you wish to filter based on (i.e., your grouping / independent variable column), followed by a double equals sign (==), and then indicate which content in that column to keep, filter(NAMEOFDATAFRAME, NAMEOFIVCOLUMN == “GROUP”). If you are filtering for more than one thing at a time, separate arguments in the filter() function with a comma. All arguments will have to be true in order for the rows to be filtered. 

NOTE: the filter() function is case sensitive, a lowercase “a” is not the same as an uppercase “A”.

The RStudio interface, with the filter() function in the source window being used to filter by the grouping variables "Gender" and "Condition".

Now that the data have been separated, we can use these six new dataframes with the shapiro.test() function to test normality of the dependent variable columns for each condition using: shapiro.test(NAMEOFDATAFRAME$NAMEOFDVCOLUMN).

The RStudio interface, with code for a Shapiro-Wilk normality test in the source window. The results of two of the six tests are shown (to pass normality, p > .05).

NOTE: though the output of only three tests are shown in the image above, the results of all six tests are important because there are six cells in the data.

Since all tests returned p > .05, we have satisfied the normality assumption and can continue with the factorial ANOVA.

Homogeneity of variance

The second assumption that we can test for in R is the assumption that variances do not differ between groups, also referred to as homogeneity of variance or homoscedasticity. This assumption only applies to factors that are between-subjects This is typically tested using either a Levene’s test or a Bartlett test. Here, we walk through how to conduct a Levene’s test in R using the leveneTest() function from the car package. Importantly, we are not going to load in the whole car package as it will interfere with some of the functions in tidyverse. Instead, to access only the leveneTest() function, we can use car:: to select a function within the car package without loading the entire package (NOTE: the package will need to be installed on your device in order for this to work). In the function, give the dependent and independent variable columns as a formula (DV ~ IV1), and specify the dataframe they are from: car::leveneTest(NAMEOFDVCOLUMN ~ NAMEOFIVCOLUMN, data = NAMEOFDATAFRAME). If there are multiple between-participant IVs, they can all be included, but must be separated with an asterisk (e.g., IV1 * IV2).  Note, we are using the dataframe that has all groups’ data for this test.

Importantly for a mixed ANOVA, before running this test we will need to collapse across the levels of any within-participant variable(s). To do this, we can use the group_by() function and the summarise() function, grouping by the ID column and the between-subjects’ IV, and calculating the mean of the DV column.

The RStudio interface, with code for a Levene's homogeneity test in the source window. The result of this test passes homogeneity (to pass, p > .05).

Since p > .05, we have satisfied the assumption.

Sphericity

The assumption of sphericity requires that the variance of differences between all “within” conditions are equal. This assumption is important when a repeated-measures factor has more than two levels, and we can test this when we run the ANOVA. When this assumption is violated, a correction can be applied to the test.

How to run a mixed factorial ANOVA

To run a mixed factorial ANOVA that also tests for violations of sphericity, we can use the ezANOVA() function from the ez package. The function can be specified as follows:  ezANOVA(dv = NAMEOFDVCOLUMN, wid = NAMEOFIDCOLUMN, within = NAMEOFWITHINIVCOLUMN, between = NAMEOFBETWEENIVCOLUMN, data = NAMEOFDATAFRAME, detailed = TRUE). Note that if giving more than one within or between IV, they should enclosed in parentheses with a period before the first bracket and comma separating them: .(WITHINIV1, WITHINIV2).  By setting detailed = TRUE, the output will include both the test for sphericity and corrections for if sphericity is violated.

NOTE: the output from the ezANOVA() function sometimes uses scientific notation.

The RStudio interface, with code for a mixed factorial ANOVA in the source window. The results of ANOVA in the console show a nonsignificant main effect of Gender, a significant main effect of Condition, and a non-significant interaction of Gender*Condition.

Interpreting the Output

Running the above steps will generate the following output:

A close-up of the mixed factorial ANOVA output, showing a non-significant main effect of Gender, a significant main effect of Condition, and a non-significant interaction of Gender*Condition.

Note that the output is split into three sections: 1) ANOVA, 2) Mauchly’s Test for Sphericity, and 3) Sphericity Corrections.

  1. The ANOVA section here has four rows: one row for the intercept of the model (which we ignore), one for each main effect (here, two rows), and one for each interaction (here, one row). Going across the columns of the output:
    • Effect: This column labels which effect is being described in each row.
    • DFn: This column gives the degrees of freedom for the factor (df1). df1 = k-1, where k is the number of levels or conditions in the factor.
    • DFd: This column gives the degrees of freedom for the residuals or error (df2). df2 is calculated differently for effects involving within factors versus between factors.
      1. Between: df2 = n-k, where k is number of levels in the factor and n is the number of participants or observations.
      2. Within: df2 = (n-1) * (a-1) * (b-1), where n is the number of participants or observations, a is the number of levels in the first factor, and b is the number of levels in the second factor.
      3. For both types of factor, when only one of that kind of factor is included in the effect, the formula will only include the relevant factor: (n-1) * (a-1).
    • SSn: This column gives the sums of squares (SS) for the effect.
    • SSd: This column gives the sums of squares (SS) for the residuals.
    • F: This column gives the test statistic (F) and its value.
    • p: This column gives the p-value indicating whether the test is significant (p<.05), or, said another way, whether there is a difference somewhere between the levels of the factor(s).
    • p<.05: This column indicates which tests have p < .05 with a star.
    • ges: This column gives the general eta squared value for the effect – this is a measure of effect size.
  2. Mauchly’s Test for Sphericity
    • Effect: This column labels which IV is being tested for sphericity violations. Only “within” variables with more than two levels will appear here, including interactions that include these variables.
    • W: This column gives the test statistic (W) for Mauchly’s test and its value.
    • p: This column gives the p-value for each test. If p < .05, the sphericity assumption has been violated and a correction should be applied to the relevant effect(s).
    • p<.05: This column indicates which tests have p < .05 with a star.
  3. Sphericity Corrections. NOTE: this section appears regardless of whether the data have violated the sphericity assumption. Corrections only need to be used when the p-value(s) in section 2) is significant. Two common corrections are given; the Greenhouse-Geisser (indicated by “GG”) and the Huynh-Feldt correction (indicated by “HF”).
    • Effect: This column labels which effect the correction is for.
    • GGe / HFe: Epsilon estimate. These are measures of the degree to which the assumption has been violated. If reporting a sphericity correction, the epsilon can be multiplied by the degrees of freedom in the ANOVA section to obtain the new, corrected, degrees of freedom.
    • p[GG] / p[HF]: The new p-value for the effect after corrections. If the correction is being applied, this is what should be reported.
    • p[GG]<.05 / p[HF]<.05: This column indicates which corrected tests have p < .05 with a star.

This output indicates that the sphericity assumption was not violated because (all ps > .05). Therefore, we can use the output in the ANOVA section when interpreting the results. This ANOVA shows a significant effect of condition (p < .05), a non-significant effect of gender (p > .05), and a non-significant interaction of condition and gender (p > .05). 

Importantly, when p > .05, we cannot say that there is “no difference" between the level(s); we can only say that we failed to find a difference. If p < .05, we could conclude that there is a difference somewhere between the levels, though we would need to follow this up with additional testing to determine which levels are significantly different from each other.

Effect size

The most appropriate effect size measure to use with factorial ANOVAs is partial eta-squared. An effect size can be calculated for each of the effects in the test with the formula: partial eta-squared = SSEffect / (SSEffect + SSResiduals). All of these components can be found in the ANOVA table:

  • SSn column: SSEffect
  • SSd column: SSResiduals

To help prevent human error, we can use code to extract these values out of the ANOVA output. If the ANOVA test has been stored in an object, we can access its contents by using $ANOVA to go into the ANOVA section, another $ operator to select the column, and square brackets [ ] to index the correct row. Notice that the rows in the output are numbered. For example: NAMEOFANOVA$ANOVA$NAMEOFCOLUMN[ROWNUMBER].

The RStudio interface, showing the calculation for partial-eta squared (an effect size) in the source window.

Post hoc tests

When running a factorial ANOVA, each effect that is tested requires its own set of post hoc tests IF the effect in the ANOVA was significant. It is recommended to first follow up on any significant interaction(s), because an interaction can create illusions with main effects. To explain, let’s consider the above example and pretend the interaction was significant. Depending on the pattern of the interaction, it may be that there is an effect of Condition on Score in women but not in men. The effect in women may be so larger that we observe a main effect of Gender, despite the effect only being present for one group. In this case, there isn’t a true main effect since it was not observed in both groups (men and women). Alternatively, if the effect of Condition goes in the opposite directions in men and women, this will result in a non-significant main effect of Condition, despite it significantly influencing scores in both groups. In this case, Condition might not seem to have an effect when really there is something happening there. The final possibility with a significant interaction, is that both groups show an effect of Condition in the same direction, but to different degrees. This last scenario will have a significant main effect that is not an illusion but the size of which does depend on the other IV.

Interaction

When following up on a significant interaction, the goal is to determine whether there is an effect of IV1 at each level of IV2. To accomplish this, you must first separate your data based on IV2. We can do this in R using the filter() function. Below we will test whether there is an effect of Condition at each level of Gender, so we will start by separating the data for each level of Gender.

In our separated data we can now test for an effect of Condition in men and women separately. Because there are more than two levels in Condition, we will run two ANOVAs (one for each Gender). If there were only two levels of Condition, we could run t-tests (expert tip: in a mixed factorial ANOVA, you need to carefully consider whether the factor you are running post hoc tests on is within or between. If within, follow up with paired samples t-tests, if between, follow up with independent sample t-tests). These tests will tell us about the simple main effects in the data.

Expert tip: When working with a mixed factorial ANOVA, it is important to keep in mind which factors are within and which factors are between, because the follow-up tests should match the data. Here, Condition is a within (repeated) factor, so these ANOVAs will be repeated-measures ANOVAs. If we instead broke up our data to test the effect of Gender at each level of Condition, we would run three independent sample t-tests because in this dataset, Gender is a between factor with two levels. If Gender had more than two levels, we would instead use one-way ANOVAs.

The RStudio interface, showing the post hoc analysis (here, a repeated-measures ANOVA) for the mixed factorial ANOVA in the source window and the output in the console.

NOTE: the image above only includes the output of one ANOVA, but two should be run, one for each level of Gender.

The output can be interpreted like any one-way ANOVA, and if significant, can be followed-up with pairwise tests. See the one-way ANOVA section for more details. Be sure to match the pairwise tests to the types of IVs.

Main effect

If you find a significant main effect for a factor that only has two levels, you can simply look at the descriptives to know which is higher / lower (as the main effect already tells you that these are different).

If your factorial ANOVA reveals a significant main effect for a factor that has more than two levels, you need to run follow-up tests to determine the location of the difference(s), as the main effect tells you there is a difference somewhere without specifying which groups might be different. The type of follow-up test will depend on whether that IV or factor is within or between, and the process will also depend on whether the IV or factor you are ignoring is within or between.

For our first example, we will follow up on the main effect of Condition. Condition is a within factor, which determines that we will be running repeated-measures pairwise t-tests. In doing these tests, we will be ignoring Gender, which is a between factor. Because Gender is between, there is no collapsing to do so we can simply move on to pairwise t-tests using the pairwise.t.test() function. This will test all possible combinations of the levels in the factor. To use this function, specify the DV and IV columns within the dataframe, indicate that these are paired comparisons because Time was measured within, and apply a p-value correction to minimized risk of Type I Error. For example: pairwise.t.test(x = NAMEOFDATAFRAME$DVCOLUMN, g = NAMEOFDATAFRAME$IVCOLUMN, paired = TRUE, p.adjust.method = “bonf”). By setting p.adjust.method to “bonf”, the test will apply a Bonferroni correction, which will multiply each p-value by the number of comparisons being made.

The RStudio interface, showing post-hoc pairwise comparisons using paired t-tests as a follow-up of a significant main effect in the mixed factorial ANOVA.

The output of the pairwise.t.test() function gives the p-values for the series of t-tests and indicates the correction applied at the bottom. To read this output, use the row and column headers to identify which p-values belong to which comparisons. In this case, all three t-tests are significant (all ps < .05), so we can conclude that there are differences between all groups.

In our second example, we will reverse our factors. Let’s pretend that Condition is a between factor that we want to follow up on, and Gender is a within factor that we want to ignore. Because we are testing a between factor, the most appropriate post hoc test for pairwise comparisons is Tukey’s HSD. Before we can run this test, we must first collapse across the within factor. This is because we currently have two observations for each participant in their Condition group: one in the male condition and one in the female condition. We should only have one observation per person per group before running our test. To collapse across Gender in this example, we can use the group_by() function and the summarise() function, grouping by the ID column and the IV of interest, and calculating the mean of the DV column. This will give us each participant’s mean Score.

The RStudio interface, showing how to use group_by() and summarise() functions to get the mean score for each participant in each condition.

Now that the data is collapsed, we can run Tukey’s HSD. It will compare each possible pair of groups to determine if they differ. Tukey’s HSD also adjusts p-values based on the number of post hoc comparisons being made to reduce the risk of Type I Error. The tukeyHSD() function in R requires an ANOVA object made by the aov() function. The first step will be to run an ANOVA with the aov() function filling it out as: ANOVAOBJECT <- aov(DVCOLUMN ~ IVCOLUMN, data = NAMEOFDATAFRAME). The next step will be to give this object to the Tukey’s test: tukeyHSD(ANOVAOBJECT).

The RStudio interface, showing the post-hoc follow-up example for the mixed factorial ANOVA.

The output from this test will show the mean differences between each group (diff), the lower (lwr) and upper (upr) confidence intervals on the mean differences, and the p-value for each comparison adjusted for Type I Error (p adj).

Suggest an edit to this guide

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.