The library offers a range of helpful services. All of our appointments are free of charge and confidential.
RStudio has many functions that accomplish the same task. In the previous section, there was an example where three columns of data were added together, and the result was divided by the number 3. This is the long-hand way to calculate the mean, which is easy if you only have a few columns of data. If you have many columns of data, this is suddenly more complicated.
There are many ways to calculate descriptive statistics in RStudio, including procedures that calculate measures such as central tendency (mean, median, mode), dispersion (range, standard deviation, variance, minimum and maximum), and kurtosis and skewness. These kinds of descriptive statistics are best suited to describe continuous variables.
summary(NAMEOFDATAFRAME)
where NAMEOFDATAFRAME is what you have called your data. This function will return several summary statistics at once for each column of data. This function returns the minimum, maximum, median, mean, and 1st and 3rd quartiles values. If you want these values for just one column, you can use the NAMEOFDATAFRAME$NAMEOFCOLUMN
format.
sapply(NAMEOFDATAFRAME, FUNCTION, na.rm = TRUE)
NAMEOFDATAFRAME %>% summarise(FUNCTION(NAMEOFCOLUMN, na.rm = TRUE))
%>%
), which can be read as “read my dataframe and then summarize it using the specified function on the specified column”.There are many ways to calculate frequency and contingency tables in RStudio. These kinds of descriptive statistics are best suited to describe categorical variables.
table(NAMEOFDATAFRAME$COLUMNS)
or table(NAMEOFDATAFRAME$ROWS, NAMDOFDATAFRAME$COLUMNS)
to generate frequency tables
, NAMEOFDATAFRAME$NAMEOFCOLUMN
]., useNA = “ifany”
].prop.table(NAMEOFTABLE)
to generate tables of proportions
table()
function. This returns the contingency table (i.e., the crosstabs showing the proportion of responses for each combination of variables).table()
function, or you can ask for only the rows using prop.table(NAMEOFTABLE, 1)
or you can ask for only the columns using prop.table(NAMEOFTABLE, 2)
. If you use the “1” format, the crosstabs will show the proportion of responses based on the row variable (here, “Gender”: the “female” variable proportions will add up to 100, and the “male” variable proportions will add up to 100). If you use the “2” format, the crosstabs will show the proportion of responses based on the column variable (here, “Colour”: the “blue” variable proportions will add up to 100, etc.).margin.table(NAMEOFTABLE)
to generate marginal frequencies
table()
function. This returns the total count of cells in the table (i.e., the number of rows in the dataset).table()
function, or you can ask for only the rows using margin.table(NAMEOFTABLE, 1)
or you can ask for only the columns using margin.table(NAMEOFTABLE, 2)
. If you use the “1” format, the marginal total will show the count of responses based on each category of the row variable. If you use the “2” format, the marginal total will show the count of responses based on each category of the column variable.NAMEOFDATAFRAME %>% group_by(NAMEOFCOLUMN) %>% summarise(group_frequency = n())
to generate frequencies
n()
is a function that counts observations.%>%
), which can be read as “read my dataframe and then group by this column and then summarise it using the specified function.This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.