Guides: Analyze Data: R and RStudio: How to use RStudio: Parametric inferential statistics

Pearson correlation

Pearson correlation (sometimes referred to as Pearson’s r, bivariate correlation, or Pearson product-moment correlation coefficient) is used to determine the strength and direction of a relationship between two continuous variables. This test assumes both variables are approximately normally distributed.

This test requires a dataframe with at least two columns of continuous data. Use glimpse(NAMEOFDATAFRAME) to see the structure of your data; the <dbl> label indicates a continuous variable.

Testing assumption(s)

To determine whether the variables being used in your correlation are normally distributed, you can use the Shapiro-Wilk test. In R, you can run this test using the shapiro.test() function on each of the columns you wish to use in the correlation. A significant Shapiro-Wilk statistic (p < .05) for one or more of your variables indicates that you have failed the test of normality, and should either transform your data or switch to a non-parametric test (e.g., the Spearman correlation).

Notice that the test on the first variable column returns a non-significant p-value (i.e., W = 0.967; p > .05). This indicates that the data in Fake_Data1 did not fail normality. The second test, on the data in Fake_Data2, returns a significant result (i.e., W = 0.881; p < .05). This indicates that the data in Fake_Data2 failed normality. Since one of the variables failed normality, it would be most appropriate to run a Spearman’s rank order correlation instead (i.e., the non-parametric alternative to a Pearson correlation). However, for the purposes of this guide, we will pretend that both variables passed the normality assumption and proceed with the Pearson correlation.

How to run a Pearson Correlation

To run a Pearson’s correlation, use the cor.test() function, passing it the two columns you wish to correlate. There are two methods of providing the two columns: individually, or as a formula.

When giving your variables individually, you provide the variable / column names separated by a comma: cor.test(NAMEOFDATAFRAME$NAMEOFCOLUMN1, NAMEOFDATAFRAME$NAMEOFCOLUMN2). The test will then pair up the entries in each column in the order they appear and run the test of correlation. This method is especially helpful when you want to correlate two sets of numbers that are not stored in the same variable.
When using the formula method, following a tilde (~), provide the column names separated by a plus sign (+) and indicate in which dataframe they can be found: cor.test(~NAMEOFCOLUMN1 + NAMEOFCOLUMN2, data = NAMEOFDATAFRAME). This method is particularly helpful when the data you are correlating are in separate columns in the same dataframe. By default, the cor.test() function will run a Pearson’s correlation, but you can also specify this using the method argument, cor.test(… method = “pearson”).

Interpreting the Output

Running the above steps will generate the following output:

The name of the statistical test
The test statistic (t) and its value
The degrees of freedom (df) for the test, which is the number of observations per column minus two (n-2)
A p-value indicating whether the test is significant (p<.05), or, said another way, whether the specified variables are correlated
The correlation coefficient (Pearson’s r) capturing the relationship between the two variables
The 95% confidence interval (CI) on the correlation coefficient

Pearson’s r can range from -1 (perfect negative) to +1 (perfect positive), and indicates the strength and direction of the relationship between the two variables. Here, we see a non-significant weak positive correlation between the two continuous variables.

Pearson correlations are best visualized using a scatterplot, which can be easily created in R. Using ggplot2, a package contained within tidyverse, we can use the ggplot() function to build a plot. First specify which data you’d like to use: ggplot(data = NAMEOFDATAFRAME), then use the aes() function inside the ggplot() function to define your axes: ggplot(data = NAMEOFDATAFRAME, aes(y = NAMEOFCOLUMN1, x = NAMEOFCOLUMN2). Next, use the plus sign (+) to add a layer to the plot you’re building, and specify which type of graph you’d like with a geom. Here, geom_point() will produce a scatterplot using the information provided to the ggplot() function. The plot will then appear in the Plots tab of RStudio.

One sample t-test

The one sample t-test is used to determine whether the mean of a single continuous variable differs from a specified constant. This test assumes that the observations are independent and that the data are normally distributed.

Note that this is a parametric test; if the assumptions of this test are not met, you could / should instead run a Wilcoxon signed rank test (i.e., the non-parametric alternative to the one sample t-test).

This test requires a dataframe with at least one continuous variable column in it. Use glimpse(NAMEOFDATAFRAME) to see the structure of your data; the <dbl> label indicates a continuous variable.

Testing assumption(s)

To determine whether the variable being used in your t-test is normally distributed, you can use the Shapiro-Wilk test. In R you can run this test using the shapiro.test() function on the column being used in the test, shapiro.test(NAMEOFDATAFRAME$NAMEOFCOLUMN). A significant Shapiro-Wilk statistic (p < .05) for your variable indicates that you have failed the test of normality, and should either transform your data or switch to a non-parametric test (e.g., the Wilcoxon signed rank test).

Notice that the test returns a non-significant p-value (i.e., W = 0.967; p > .05). This indicates that the data in Fake_Data1 did not fail normality. Since the variable did not fail the normality assumption, we can continue with the t-test.

How to run a one-sample t-test

To run a one-sample t-test, simply use the t.test() function. This function requires two pieces of information: the data to be tested, and the constant to test them against. Provide the column that contains the data, and use the mu parameter to define the constant: t.test(NAMEOFDATAFRAME$NAMEOFCOLUMN, mu = CONSTANT).

The code above will determine whether the mean of the specified column of data statistically differs from the constant.

Interpreting the Output

Running the above steps will generate the following output:

The name of the statistical test
The test statistic (t) and its value
The degrees of freedom (df) for the test which is the number of observations minus one (n-1)
A p-value indicating whether the test is significant (p<.05), or, said another way, whether the specified variable differs from the specified constant
1. Note: R often provides values in scientific notation. Here, p = 1.424e-10 = 0.0000000001424
2. Note: the “alternative hypothesis” line indicates the specified constant
The mean of the selected variable
The 95% confidence interval (CI) on the mean

Effect size

Cohen’s d can be used to quantify the effect size on the difference between the data and the constant. Cohen’s d is also referred to as the standardized mean difference and it is the typical effect size used for parametric t-tests.

For a one-sample t-test, the formula for Cohen’s d is the difference between the mean of the data being tested and the constant, divided by the standard deviation of the data. This can be calculated in R by:

writing out the formula
using the cohens_d() function from the effectsize package

The cohens_d() function will also give a 95% confidence interval on the effect size. When using the cohens_d() function with a one-sample comparison, provide the dataframe and column that contains the data, and use the mu parameter to define the constant: cohens_d(NAMEOFDATAFRAME$NAMEOFCOLUMN, mu = CONSTANT).

Independent samples t-test

The independent samples t-test (also referred to as a between subject’s t-test, student’s t-test, unpaired t-test, two-sample t-test…) is used to determine whether two groups’ means on the same continuous variable differ. This test assumes that the observations are independent and are randomly sampled from normal distributions that have the same population variance.

Note that this is a parametric test; if the assumptions of this test are not met, you could / should instead run a Mann Whitney U test (i.e., the non-parametric alternative to the independent samples t-test).

This test requires a dataframe with one column of continuous data (DV) and one column of categorical or grouping data (IV). Use glimpse(NAMEOFDATAFRAME) to see the structure of your data; the <dbl> label indicates a continuous variable and the <fct> label indicates a categorical variable. If your grouping column isn’t factor (“<fct>”) type, use: NAMEOFDATAFRAME$NAMEOFIVCOLUMN <- as.factor(NAMEOFDATAFRAME$NAMEOFIVCOLUMN) to change it.

Testing assumption(s)

Normality

To determine whether the data being used in your t-test are normally distributed, you can use the Shapiro-Wilk test. In R you can run this test using the shapiro.test() function on the column being used as the dependent variable (DV) in your t-test. Note, you will need to apply this test to both groups in your test separately. A significant Shapiro-Wilk statistic (p < .05) for one or more of your groups indicates that you have failed the test of normality, and should either transform your data or switch to a non-parametric test (e.g., the Mann Whitney U test).

Since data in R is usually arranged in long format, with one column containing all dependent variable data and one column containing the independent variable labels (for a visualization of long format data, see the Paired-samples t-test section of the LibGuide), we first need to separate our two groups so that we can test normality on each. We can do this using the filter() function from the tidyverse package. When using the filter() function, first specify the dataframe that contains your independent and dependent variable data, then give a logical statement specifying which rows in the dataframe to keep. To build the logical statement, specify the column to you wish to filter (i.e., your grouping / independent variable column), then enter a double equals sign (==), and indicate which label in that column to keep, filter(NAMEOFDATAFRAME, NAMEOFIVCOLUMN == “GROUP”). NOTE: the filter() function is case sensitive (i.e., a lowercase “a” is not the same as an uppercase “A”) and much match exactly in order to work.

Now that the data has been separated, we can use these new dataframes with the shapiro.test() function to test normality of the groups of our dependent variable columns: shapiro.test(NAMEOFDATAFRAME$NAMEOFDVCOLUMN).

Notice that the test returned non-significant p-values (i.e., p = .154 and .681). This indicates that both groups did not fail normality, and so we can continue with the t-test.

Homogeneity of variance

The second assumption that we can test for in R is the assumption that variances do not differ between groups, also referred to as homogeneity of variance or homoscedasticity. This is typically tested using either a Levene’s test or a Bartlett test; here, we walk through how to conduct a Bartlett test in R using the bartlett.test() function. Pass the function the dependent and independent variable columns as a formula (DV ~ IV) and specify the dataframe they are from: bartlett.test(NAMEOFDVCOLUMN ~ NAMEOFIVCOLUMN, data = NAMEOFDATAFRAME). Note, we use the dataframe that has both groups’ data for this test. A significant Levene’s or Bartlett’s test (p < .05) indicates that you have failed the homogeneity of variance assumption, and should either transform your data, switch to a non-parametric test (e.g., the Mann Whitney U test), or apply an adjustment (see below).

Notice that the test returned a non-significant p-value (i.e., p = .842). This indicates that the data did not fail the homogeneity of variance assumption, and so we can continue with the t-test without an adjustment (see below).

How to run an independent samples t-test

You can use the t.test() function two ways to run an independent samples t-test:

If the dependent variable data for each group are in separate columns or separate variables (e.g., like the variables we made to test normality), you can pass these two columns to the function separately: t.test(NAMEOFDATAFRAME1$NAMEOFDVCOLUMN, NAMEOFDATAFRAME2$NAMEOFDVCOLUMN, …).
If the dependent variable data for each group are in one column with the grouping labels in another column of the same dataframe, you can pass the t.test() function a formula with the dependent and independent variable columns (DV ~ IV), and specify the dataframe they are from: t.test(NAMEOFDVCOLUMN ~ NAMEOFIVCOLUMN, data = NAMEOFDATAFRAME, …).

The t.test() function has other parameters which are important to set properly in the function. The alternative parameter specifies how you are defining your alternative hypothesis. Unless you have an a priori one-sided / one-tailed hypothesis, this parameter should be set to “two.sided”. The var.equal parameter is set based on the test of equal variances. If the Bartlett test was non-significant, set var.equal to TRUE to use pooled variances in the test. If the Bartlett test was significant, set var.equal to FALSE to run a Welch t-test which uses separate variances for the two groups. An example independent-samples t-test script may look like this: t.test(NAMEOFDVCOLUMN ~ NAMEOFIVCOLUMN, data = NAMEOFDATAFRAME, alternative = “two.sided”, var.equal = TRUE).

Note: it is important that TRUE and FALSE are written in all capital letters for this code to run properly.

Interpreting the Output

Running the above steps will generate the following output:

The name of the statistical test. This will change to “Welch t-test” if var.equal = FALSE
The test statistic (t) and its value
The degrees of freedom (df) for the test, which is the number of observations minus the number of groups (n-k). Note, when var.equal is set to FALSE, the degrees of freedom are no longer n-k
A p-value indicating whether the test is significant (p<.05), or, said another way, whether the means of the two groups are statistically different from each other
The means for each group
The 95% confidence interval (CI) on the difference between the group means

Effect size

Cohen’s d can be used to quantify the effect size on the difference between the two groups’ means. Cohen’s d is also referred to as the standardized mean difference and it is the typical effect size used for parametric t-tests.

Cohen’s d can be calculated easily in R by using the cohens_d() function from the effectsize package, and this function returns a 95% confidence interval on the effect size. You can use either the formula format or provide two separate columns. The function also allows you to specify the alternative hypothesis (this affects the confidence interval on Cohen’s d), and whether pooled standard deviation should be used. If the Bartlett test was non-significant, set pooled_sd to TRUE, if the Bartlett test was significant, set pooled_sd to FALSE. An example script may look like this: cohens_d(DV ~ IV, data = NAMEOFDATAFRAME, alterative = “two.sided”, pooled_sd = TRUE).

Paired samples t-test

The paired-samples t-test (also referred to as a dependent t-test, repeated samples t-test, matched pairs t-test…) is used to determine whether the means of two continuous variables (e.g., before and after treatment) from the same group of participants differ. This test assumes that the participants are independent from one another, while the two variables (e.g., measurements) are from the same participant. This test also assumes that the distribution of differences is normally distributed and there are no extreme outliers in the differences.

Note that this is a parametric test; if the assumptions of this test are not met, you could / should instead run a Wilcoxon signed-rank test (i.e., the non-parametric alternative to the paired-samples t-test).

Testing assumption(s)

Normality

Since the normality assumption for a paired samples t-test is in reference to the differences between the conditions, we must first calculate a difference score. To do this, we will make a new column that contains the result of subtracting the values in condition 1 from the values in condition 2 (note: you could instead subtract the values in condition 2 from the values in condition 1). This should be done within-participant (i.e., each participant has one value for each condition, and we find the difference between the two conditions for each participant).

If your data is in wide format (one row per participant), we simply subtract the condition 1 column from the condition 2 column and save it in a new third column using the $ operator.

If your data is in long format (one row per observation), we’ll calculate the differences using the summarise() command and diff() function. First, we will group by Participant since we want to ensure these difference scores are calculated within each Participant, then we can use the summarise() and diff() functions to calculate the differences. Note, the summarise() function and the pipe operator (%>%) are both from the tidyverse package, which will need to be installed and loaded in order for this script to work.

Now that we have the difference scores, we can test them for normality. To determine whether the differences are normally distributed, you can use the Shapiro-Wilk test. In R you can run this test using the shapiro.test() function on the difference column we just made. A significant Shapiro-Wilk statistic (p < .05) indicates that you have failed the test of normality, and should either transform your data or switch to a non-parametric test (e.g., the Wilcoxon signed-rank test).

To use the shapiro.test() function, give it the dataframe and column with the difference scores: shapiro.test(NAMEOFDATAFRAME$NAMEOFCOLUMN).

Notice that the test returned a non-significant p-value (i.e., p = .502). This indicates that the data did not fail normality, and so we can continue with the t-test.

How to run a paired-samples t-test

You can use the t.test() function two ways to run a paired-samples t-test:

If the dependent variable data for each condition are in separate columns (i.e., wide format) or separate variables, you can pass these two columns to the function: t.test(NAMEOFDATAFRAME1$NAMEOFCOLUMN1, NAMEOFDATAFRAME2$NAMEOFCOLUMN2, paired = TRUE).
- Note: In the case that the data are split between two dataframes, it is important that the rows line up (e.g., participant 1 is in row 1 of both dataframes).
If the dependent variable data for both conditions are in the same column, with the condition labels in another column of the same dataframe (i.e., long format), pass the t.test() function a formula with the dependent and independent variable columns (DV ~ IV), and specify the dataframe they are from: t.test(NAMEOFDVCOLUMN ~ NAMEOFIVCOLUMN, data = NAMEOFDATAFRAME, paired = TRUE).

Note: it is important that TRUE is written in all capital letters for this code to run properly.

Interpreting the Output

Running the above steps will generate the following output:

The name of the statistical test
The test statistic (t) and its value
The degrees of freedom (df) for the test which is the number of pairs minus one (n-1)
A p-value indicating whether the test is significant (p<.05), or, said another way, whether the paired data varies between condition 1 and condition 2
1. Note: R often provides values in scientific notation. Here, p = 7.871e-15 = 0.000000000000007871
The mean difference between conditions
The 95% confidence interval (CI) on the difference between the means

Effect size

Cohen’s d can be used to quantify the effect size on the difference between the two paired conditions’ means. Cohen’s d is also referred to as the standardized mean difference and it is the typical effect size used for parametric t-tests.

Cohen’s d can be calculated easily in R by using the cohens_d() function from the effectsize package, and this function returns a 95% confidence interval on the effect size. For a paired-samples measure of effect size, you will have to specify the DV and IV columns. Be sure to specify that the data is paired by setting the paired parameter to TRUE. An example script may look like this: cohens_d(x = NAMEOFDATAFRAME$DVCOLUMN, y = NAMEOFDATAFRAME$IVCOLUMN, paired = TRUE).

One-way ANOVA

The one-way Analysis of Variance (ANOVA) is used to determine whether the means from three or more groups differ. This test assumes that each group is an independent random sample from normally distributed populations, and that the populations have equal variances.

Note that this is a parametric test; if the assumptions of this test are not met, you could / should instead run a Kruskal-Wallis H test (i.e., the non-parametric alternative to the one-way ANOVA).

This test requires a dataframe with one column of continuous data (DV) and one column of categorical or grouping data (IV). Use glimpse(NAMEOFDATAFRAME) to see the structure of your data; the <dbl> label indicates a continuous variable and the <fct> label indicates a categorical variable. If your grouping column isn’t factor (“<fct>”) type, use: NAMEOFDATAFRAME$NAMEOFIVCOLUMN <- as.factor(NAMEOFDATAFRAME$NAMEOFIVCOLUMN) to change it.

Testing assumption(s)

Normality

To determine whether the data being used in your ANOVA are normally distributed, you can use the Shapiro-Wilk test. In R you can run this test using the shapiro.test() function on the dependent variable (DV) for each group (IV) in your ANOVA. A significant Shapiro-Wilk statistic (p < .05) for one or more of your groups indicates that you have failed the test of normality, and should either transform your data or switch to the non-parametric alternative of the one-way ANOVA (e.g., the Kruskal-Wallis H test).

Since data in R is usually arranged in long format, with one column containing all dependent variable data and one column containing the independent variable labels (for a visualization of long format data, see the repeated measures ANOVA section of the LibGuide), we first need to separate the groups so that we can test normality on each. We can do this using the filter() function from the tidyverse package. When using the filter() function, first specify the dataframe that contains your IV and DV data, then give a logical statement specifying which rows in the dataframe to keep. To build the logical statement, specify the column to you wish to filter based on (i.e., your grouping / independent variable column), followed by a double equals sign (==), and then indicate which content in that column to keep, filter(NAMEOFDATAFRAME, NAMEOFIVCOLUMN == “GROUP”).

NOTE: the filter() function is case sensitive, a lowercase “a” is not the same as an uppercase “A”.

NOTE: in the image above the colour words appear highlighted in the given colour. This is a feature of certain versions of RStudio. If you do not see this in your script, that is okay.

Now that the data has been separated, we can use these new dataframes with the shapiro.test() function to test normality of the dependent variable column for each group, shapiro.test(NAMEOFDATAFRAME$NAMEOFDVCOLUMN).

NOTE: though only three tests are shown in the image above, the results of all four tests are important because there are four groups in the data.

Since all tests returned p > .05, we have satisfied the normality assumption and can continue with the one-way ANOVA. If this assumption was violated for one or more groups, a Kruskal-Wallis H test should be used instead of a one-way ANOVA.

Homogeneity of variance

The second assumption that we can test for in R is the assumption that variances do not differ between groups, also referred to as the homogeneity of variance assumption or homoscedasticity. This is typically tested using either a Levene’s test or a Bartlett test. Here, we walk through how to conduct a Levene’s test in R using the leveneTest() function from the car package. Importantly, we are not going to load in the whole car package as it will interfere with some of the functions in tidyverse. Instead, to access only the leveneTest() function, we can use car:: to select a function within the car package without loading the entire package (NOTE: the package will need to be installed on your device in order for this to work). In the function, give the DV and IV columns as a formula (DV ~ IV), and specify the dataframe they are from, car::leveneTest(NAMEOFDVCOLUMN ~ NAMEOFIVCOLUMN, data = NAMEOFDATAFRAME).

Since p > .05, we have satisfied the assumption. If this assumption was violated, a Kruskal-Wallis H test should be used instead of a one-way ANOVA.

How to run a one-way ANOVA

You can use the aov() function to run an ANOVA. This function requires a formula specifying the dependent variable and independent variable (DV ~ IV) and the dataframe they come from: aov( NAMEOFDVCOLUMN ~ NAMEOFIVCOLUMN, data = NAMEOFDATAFRAME). Save this model into a variable using the <- operator, then use the summary() function to see the ANOVA table: summary(NAMEOFMODEL).

Interpreting the Output

Running the above steps will generate the following output:

Note that there are two rows, one for the IV and one for the residuals (also called the error), as well as many columns:

Df: This column gives the degrees of freedom for the test, one for the IV (df1) and one for the residuals (df2). df1 = k-1, where k is the number of groups; df2 = n-k, where n is total number of participants.
Sum Sq: This column gives the sums of squares (SS) for the IV and residuals.
Mean Sq: This column gives the Mean Square (MS) for the IV and residuals.
F value: This column gives the obtained F-value for the test.
Pr(>F): This column gives the p-value for the test (i.e., this indicates statistical significance if p < .05).

Here, p = .923; we cannot conclude that there are any statistically significant differences between the groups (as p > .05).

Effect size

For one-way ANOVAs, the typical measure of effect size is η² (“eta-squared”). This represents the proportion of variation accounted for by the grouping factor (i.e., the IV). It is calculated as follows: SS_Effect / SS_total. The SS values can be taken from the ANOVA table, where SS_total = SS_Effect + SS_residuals. These values can all be pulled from the ANOVA output:

Effect row, Sum Sq column: SS_Effect
Residuals row, Sum Sq column: SS_Residuals

Post hoc tests

Typically, when an ANOVA is non-significant, you would not run post hoc tests. If, however, the ANOVA returned p < .05, this indicates a difference between groups somewhere. To determine where the difference(s) lies, we must run follow-up tests (generally post hoc tests).

Tukey’s HSD is a common post hoc test to assess for differences between groups in a one-way ANOVA; it compares each possible pair of groups to determine if they differ. Tukey’s HSD also adjusts p-values based on the number of comparisons being made (expert tip: this reduces the risk of Type I Error, i.e., incorrectly rejecting the null or incorrectly saying there is a difference between the groups). To run a Tukey’s HSD test, use the TukeyHSD() function and give it the ANOVA model you made: TukeyHSD(NAMEOFMODEL).

The output from this test will show the mean differences between each group (diff), the lower (lwr) and upper (upr) confidence intervals on the mean differences, and the p-value for each comparison adjusted for Type I Error (p adj). If p < .05, there is a statistically significant difference between those groups.

One-way repeated measures ANOVA

The one-way repeated measures Analysis of Variance (ANOVA) is used to determine whether the means of three or more continuous variables OR measurements at three or more timepoints on the same continuous variable differ for one group. There are several assumptions of this test; two important ones you should consider are normality and sphericity.

Note that this is a parametric test; if the assumptions of this test are not met, you could / should instead run a Friedman test (i.e., the non-parametric alternative to the repeated measures ANOVA).

This test requires a dataframe with one column of continuous data (DV), one column of categorical or grouping data (IV), and one identification column (ID). The ID column is used to match observations across conditions; for example, if the data belong to participants, the ID column would label which observations belong to participant 1, participant 2, et cetera. There should be one value for each participant in each condition. Use glimpse(NAMEOFDATAFRAME) to see the structure of your data; the <dbl> label indicates a continuous variable and the <fct> label indicate a categorical variable. The DV must be a continuous variable and your IV and ID variables must be categorical variables. If your IV and ID column aren’t factors (“<fct>”), use: NAMEOFDATAFRAME$NAMEOFCOLUMN <- as.factor(NAMEOFDATAFRAME$NAMEOFCOLUMN) to change them.

NOTE: This test requires that the data are arranged in Long Format, with each row having a single DV value in a single condition:

Testing assumption(s)

Normality

To determine whether the data being used in your ANOVA are normally distributed, you can use the Shapiro-Wilk test. In R you can run this test using the shapiro.test() function on the dependent variable (DV) for each condition (IV group) in your ANOVA. A significant Shapiro-Wilk statistic (p < .05) for one or more of your conditions indicates that you have failed the test of normality, and should either transform your data or switch to the non-parametric alternative of the repeated measures ANOVA (e.g., the Friedman test).

Since the data are arranged in long format, with one column containing all dependent variable data and one column containing the independent variable labels, we first need to separate the conditions so that we can test normality on each. We can do this using the filter() function from the tidyverse package. When using the filter() function, first specify the dataframe that contains your IV and DV data, then give a logical statement specifying which rows in the dataframe to keep. To build the logical statement, specify the column to you wish to filter based on (i.e., your IV column), followed by a double equals sign (==), and then indicate which content in that column to keep: filter(NAMEOFDATAFRAME, NAMEOFIVCOLUMN == “CONDITION”).

NOTE: the filter() function is case sensitive, a lowercase “a” is not the same as an uppercase “A”.

Now that the data have been separated, we can use these new dataframes with the shapiro.test() function to test normality of the dependent variable for each condition: shapiro.test(NAMEOFDATAFRAME$NAMEOFDVCOLUMN).

Notice that two of the three tests in this example failed normality (i.e., had p < .05). If one or more of the conditions fail normality, you should switch to a Friedman’s test (i.e., the non-parametric alternative to a repeated measures ANOVA). For the purposes of this guide, we will pretend that all conditions are normal and proceed with the repeated measures ANOVA.

Sphericity

The assumption of sphericity requires that the variance of differences between all conditions are equal. This assumption is important when a repeated-measures factor has more than two levels. Sphericity is assessed when we run the repeated measures ANOVA. If the sphericity assumption is violated, a correction can be applied (which is part of the ANOVA output).

How to run a repeated measures ANOVA

One approach to run a repeated measures ANOVA is to use the aov() function. This function requires a formula specifying the dependent, independent, and identification variables (DV ~ IV + Error(ID / IV) ) and the dataframe they come from: aov( NAMEOFDVCOLUMN ~ NAMEOFIVCOLUMN + Error(NAMEOFIDCOLUMN / NAMEOFIVCOLUMN, data = NAMEOFDATAFRAME). Save this model into a variable using the <- operator, the use the summary() function to see the ANOVA table: summary(NAMEOFMODEL).

Importantly, the above method does not test the assumption of sphericity. To test this assumption, we can use the ezANOVA() function from the ez package. The function can be filled out as such: ezANOVA(dv = NAMEOFDVCOLUMN, wid = NAMEOFIDCOLUMN, within = NAMEOFIVCOLUMN, data = NAMEOFDATAFRAME, detailed = TRUE). Note that the “within” parameter assumes that the IV is a repeated-measures (also called “within”) factor. By setting “detailed = TRUE”, the output will include both the test for sphericity and corrections for if it is violated.

NOTE: the output from the ezANOVA() function sometimes uses scientific notation.

Interpreting the Output

Running the above steps will generate the following output:

Note that the output is split into three sections: ANOVA, Mauchly’s Test for Sphericity, and Sphericity Corrections.

The ANOVA section has two rows: one for the intercept of the model and one for the IV. We only need to read / interpret the IV row (here, labelled as “Condition”). Going across the columns of the output:
- Effect: This column labels which effect is being described in each row.
- DFn: This column gives the degrees of freedom for the IV (df1). df1 = k-1, where k is the number of conditions.
- DFd: This column gives the degrees of freedom for the residuals (df2). df2 = (n-1) * (k-1), where n is the number of participants and k is the number of conditions.
- SSn: This column gives the sums of squares (SS) for the IV.
- SSd: This column gives the sums of squares (SS) for the residuals.
- F: This column gives the obtained F-value for the IV.
- p: This column gives the p-value for the test (i.e., this indicates statistical significance if p < .05).
- p<.05: This column indicates whether the p-value is significant with a star.
- ges: This column gives the general eta squared value for the IV (i.e., a measure of effect size).
Mauchly’s Test for Sphericity
- Effect: This column labels which IV is being tested for sphericity violations.
- W: This column gives the test statistic for Mauchly’s test of sphericity.
- p: This column gives the p-value for the test. If p < .05, the data violate the sphericity assumption and a correction should be applied.
- p<.05: This column indicates whether the p-value is significant with a star.
Sphericity Corrections. Note: this section appears regardless of whether the data has violated the sphericity assumption. Corrections only need to be used if the p-value in section 2) is significant. Two common corrections are given: the Greenhouse-Geisser (indicated by “GG”) and the Huynh-Feldt correction (indicated by “HF”).
- Effect: This column labels which IV the correction is for.
- GGe or HFe: Epsilon estimate. These are measures of the degree to which the assumption has been violated. If reporting a sphericity correction, the epsilon can be multiplied by the degrees of freedom in the ANOVA section to obtain the corrected degrees of freedom.
- p[GG] or p[HF]: The new p-value for the effect after corrections (i.e., this indicates statistical significance if p < .05). If the correction is being applied, this is what should be reported.
- p[GG]<.05 or p[HF]<.05: This column indicates whether the corrected p-value is significant with a star.

Here, Mauchly’s p > .05 (i.e., the sphericity assumption), so a correction need not be applied. Using the ANOVA section of the output, there is a significant effect of Condition because p < .05.

Effect size

The ezANOVA() function already gives one measure of effect size in its output, generalized eta squared (ges). A more common measure of effect size for a repeated measures ANOVA is eta squared. This can be calculated by: SS_Effect / (SS_Effect + SS_Residuals). These values can all be pulled from the ezANOVA() output:

IV row, SSn column: SS_Effect
IV row, SSd column: SS_Residuals

Post hoc tests

Typically, when an ANOVA is non-significant, you would not run post hoc tests. If, however, the ANOVA returned p < .05, this indicates a difference between conditions somewhere. To determine where the difference(s) lies, we must run follow-up tests (generally paired t-tests).

Paired t-tests are a common post hoc test to assess for differences between groups in a repeated-measures ANOVA; these could compare each possible pair of conditions to determine if they differ. To correct for Type I Error (i.e., incorrectly rejecting the null or incorrectly saying there is a difference between the conditions), the Bonferroni correction should be applied by multiplying each p-value by the number of comparisons being made. In other words, if three paired t-tests are being made, the p-value for each t-test should be multiplied by three.

Conveniently, the pairwise.t.test() function can run every combination of t-test with Bonferroni corrections for a given set of data. This function is: pairwise.t.test(x = NAMEOFDATAFRAME$NAMEOFDVCOLUMN, g = NAMEOFDATAFRAME$NAMEOFIVCOLUMN, paired = TRUE, p.adjust.method = “bonf”). Since this function does not make use of an identification column, it assumes that each participant’s data appears in adjacent rows. Therefore, it is important that the data is arranged in wide format (i.e., one row per participant). This can be done using the arrange() function: NAMEOFDATAFRAME <- arrange(NAMEOFDATAFRAME, NAMEOFIDCOLUMN).

The output of the pairwise.t.test() function gives the p-values for the paired t-tests and indicates the correction applied (e.g., Bonferroni) at the bottom. To read this output, use the row and column headers to identify which p-values belong to which comparisons. In this case, all three t-tests are significant (p < .05), indicating all of the condition means are different from each other.