Skip to Main Content

Analyze Data: SPSS

Contributors: Lindsay Plater and Narjes Mosavari

Linear regression

Linear regression is used when we want to make predictions about a continuous dependent variable (also called an outcome variable) based on one or more independent variables (also called predictor variables). This requires a single continuous outcome variable, and one (simple linear regression) or more (multiple linear regression) predictor variables.

Note that this is a parametric test; if the assumptions of this test are not met, you could / should transform your data (and check all assumptions again). For a refresh on how to check normality, read the “Explore” procedure on the Descriptive Statistics SPSS LibGuide page. For this test, normality is calculated on the RESIDUALS (see below for more details). Ideally, your p-value should be > .05, your histogram should approximate a normal distribution (i.e., a standard “bell-shaped curve”), and the points on your P-P plot (see below) should be fairly close to the line.

How to run a linear regression

  1. Click on Analyze. Select Regression. Select Linear.
  2. Place the continuous outcome variable in the “Dependent” box.
  3. Place one or more categorical predictor variables (NOTE: you must create n - 1 dummy variables, where n is the number of categories of the categorical predictor variable) and / or one or more continuous predictor variables in the “Independent(s)” box.
  4. If running multiple linear regression, click the “Statistics” button and ensure “Collinearity diagnostics” is checked.
  5. Click the “Plots” button. Move “*ZRESID” to the “Y” box, move “*ZPRED” to the “X” box, and ensure “Normal probability plot” is checked in the “Standardized Residual Plots” section
  6. Click the “Save” button and ensure “Standardized” is checked in the “Residuals” section.
  7. Click OK to run the test (results will appear in the output window).

SPSS in data view, with the Analyze > Regression > Linear dialogue box open. "Fake_data1" is the outcome variable, Gender and "Fake_data2" are the predictor variables.

Reminder! Normality of the residuals is one of the assumptions of a linear regression. By running the above steps, we should have a new column called “ZRE_1” (standardized residuals) in our data view; this is the column you should run through the “Explore” procedure to check normality of the residuals.

Interpreting the output

Running the above steps will generate the following output: a Model Summary table, an ANOVA table, a Coefficients table, and a Charts section (with a normal P-P plot). Additionally, running the Explore procedure will generate the Tests of Normality table.

The navigation pane after running linear regression and Explore (normality) in SPSS.

The Model Summary table indicates the R-Square and Adjusted R-Square values. These are two methods to determine the proportion of the outcome variable’s variance that is being accounted for by the use of the predictor variable(s) included within the model. The R-squared value provides an indication of the goodness of fit of the model produced; we can see that roughly 9.1% of the variance of the outcome variable (fake_data1) is being accounted for by this model (gender and fake_data2).

The Model summary table from the linear regression procedure in SPSS, showing a low R-square value (9.1%).

The ANOVA table indicates whether the model you indicated significantly predicts the outcome variable. Here, a p-value less than .05 in the “Sig.” column is required; if your p-value is greater than .05, this indicates poor model fit (and the regression model should NOT be interpreted further).

The ANOVA table from the linear regression procedure in SPSS, showing a non-significant model (p > .05).

The Coefficients table is the regression proper. This includes a constant row (generally ignored unless you are creating a prediction equation), one row for each continuous predictor variable, and n - 1 rows for any categorical predictor variables (where n is the number of categories of the categorical predictor variable). If you do not have the correct / expected number of rows in this table, your regression has been run incorrectly; ensure you have created the appropriate dummy variables for your categorical predictor(s).

  • The Unstandardized Coefficients “B” column and the “Sig.” column are used to interpret the influence of the specified predictor variable on the outcome variable.
  • For continuous predictor variables, we interpret significant values as follows: holding all other variables constant, on average, a one-unit increase in the predictor variable resulted in an increase [positive unstandardized B value] or a decrease [negative unstandardized B value] of the outcome variable.
    • Here, fake_data2 is non-significant, but each one unit increase in fake_data2 resulted in (on average and accounting for Gender) a 10.2% increase in fake_data1.
  • For categorical predictor variables, we interpret significant values as follows: holding all other variables constant, on average, category 1 [the coded variable] resulted in an increase [positive unstandardized B value] or a decrease [negative unstandardized B value] of the outcome variable compared to category 0 [the reference variable].
    • Here, gender is non-significant, but (on average and accounting for fake_data2), females had a 69.1% lower score on fake_data1 than males.
  • Embedded within the Coefficients table is the Collinearity Statistics section, which reports the variable inflation factor (VIF) scores for each variable in a multiple linear regression. We can use the VIF scores to assess the assumption of multicollinearity.
    • As VIF values increase, the likelihood of multicollinearity being present within the model increases; VIF values < 3 indicates very low correlation between predictor variables, while VIF values between 3 – 8 indicates some correlation (and a potential risk of multicollinearity), and VIF values > 8 – 10 indicates high correlation (and likely multicollinearity).
    • If you have high multicollinearity, this indicates that your continuous predictor variables are explaining the same variance (and thus variables with high VIF scores should be removed from the regression).

The Coefficients table from the linear regression procedure in SPSS, showing non-significant Gender and Fake_data2 predictor variables (p > .05).

The Charts section includes a Normal P-P Plot, which can be used to visually assess the normality of the residuals assumption. To pass the normality of the residuals assumption, the data points should fall close to or on the line; if the data fall far from the line, the normality of the residual assumption has failed and you could / should consider transforming your data (and checking all assumptions).

The Normal P-P plot from the linear regression procedure in SPSS, showing data-points that fall roughly close to the line.

The Tests of Normality table returns both the Kolmogorov-Smirnov statistic and the Shapiro-Wilk statistic of the standardized residual. If p > .05 (in the “Sig.” column), you have passed the normality of the residuals assumption for linear regression; if p < .05 (in the “Sig.” column), you have failed the normality of the residuals assumption and you could / should consider transforming your data (and re-checking all assumptions).

The Tests of Normality table from the Explore procedure in SPSS, indicating whether the residuals have passed normality (p > .05) or failed normality (p < .05).

Logistic regression

Logistic regression is used when we want to make predictions about a binary dependent variable (also called an outcome variable) based on one or more independent variables (also called predictor variables). This requires a single categorical (binary) outcome variable, and one (simple logistic regression) or more (multiple logistic regression) predictor variables.

How to assess linearity

There is an assumption that must be met if you are including one or more continuous predictor variables in your logistic regression model: linearity. The assumption of linearity assesses if the relationship between the logit transformation of the outcome variable and any continuous predictor variable(s) are, in fact, linear. NOTE: you only need to check linearity for continuous predictor variables, not for categorical predictor variables.

To test this assumption, we will use the Box-Tidwell test. To do this, we include all predictors in the model as normal, but we also include an interaction term for each continuous predictor. For example, if your model includes continuous predictor variable “X”, when you check for linearity your model must include “X” as well as “X * ln(X)” (the interaction).

  1. Click Transform. Select Compute Variable.
  2. Add a name for your interaction term in the “Target Variable” box.
  3. In the “Numeric Expression” box, use the following format: NAMEOFPREDICTOR  * ln(NAMEOFPREDICTOR)
  4. Click OK (your new column of data will be the rightmost column of data).
  5. Repeat this procedure (steps 1-4) for any other continuous variables.

SPSS data view. Transform, Compute Variable has been selected. An age_interaction term has been created using Age * ln(Age).

  1. Create your logistic regression model as normal with ALL predictor variables you plan to include, then add the interaction term(s) you computed. Click OK.

SPSS in Data View. Analyze, Regression, Binary Logistic has been selected. A multiple regression with linearity checking is in the pop-up box.

To pass the linearity assumption, the interaction term(s) (e.g., X * ln(X)) must be non-significant (e.g., the value in X * ln(X)’s “Sig.” column must be > .05); if one or more of the interaction terms are significant (e.g., p < .05), you have failed the assumption of linearity.

Logistic regression output from SPSS. The age variable has failed linearity (age_interaction p < .05).

To proceed with the logistic regression if you have failed linearity, you could categorize (e.g., “bin”) your continuous variable(s), or you could try a transformation [expert tip: start by transforming the predictor variable, and if that doesn’t work, you can try also transforming the outcome variable; if a transformation is required to produce a linear relationship between a predictor variable and outcome variable, all subsequent models must incorporate both the untransformed predictor variable and the transformed predictor variable]. Here, we have failed linearity since the Age * ln(Age) interaction term returned p = .007. To continue with this example, we will use the age_binned variable (0 = age < 18; 1 = age 18+).

How to assess multicollinearity

When producing a logistic regression model with multiple predictor (independent) variables, there is an additional assumption that must be met: multicollinearity. Multicollinearity is the assumption that the predictor variables within a multivariable model are predicting a sufficiently different aspect of the outcome (dependent) variable, so that we are not including variables accounting for the same variability. Essentially, this assumption checks that each predictor in the model is actually explaining / accounting for unique variance within the model.

To measure the amount of multicollinearity that exists between two predictor variables, we can use the Variable Inflation Factor (VIF). As VIF values increase, the likelihood of multicollinearity being present within the model increases. VIF values < 3 indicates very low correlation between predictor variables, while VIF values between 3 – 8 indicates some correlation (and a potential risk of multicollinearity), and VIF values > 8 – 10 indicates high correlation (and likely multicollinearity).

Next, let’s check the VIF scores for sex and age_binned before we check whether these can be used to predict survival. We check multicollinearity like so:

  1. Click on Analyze. Select Regression. Select Linear. (yes, linear; this isn’t a typo!)
  2. Build your model (move your outcome variable to the dependent box, and your two or more predictor variables to the independents box).
  3. Click the Statistics button. Select “Collinearity diagnostics” button.
  4. Click Continue. Click OK.

Linear regression in SPSS, checking for multicollinearity of a logistic regression.

Looking at the Coefficients table, in the VIF column we see low values (approximately 1.0) for both the categorized age variable and sex. Based on the cutoff values specified above, this indicates we have passed the assumption of multicollinearity, and can proceed with the regression.

Linear regression output in SPSS to check multicollinearity for a logistic regression. Here, VIF scores are low (around 1); no problem with multicollinearity.

Note: We do not consider the VIF values between the untransformed and transformed versions of the same variable (e.g., comparing the VIF of age and age_squared, if you went that route), as they are inherently correlated.

How to run a logistic regression

If you have passed all of your assumptions, you can move on to the logistic regression.

  1. Click on Analyze. Select Regression. Select Binary Logistic.
  2. Place the binary outcome variable in the “Dependent” box.
  3. Place one or more predictor variables in the “Block 1 of 1” box [the independents box].
  4. If you have one or more categorical predictors, click the Categorical button and move your categorical predictor(s) to the “Categorical Covariates” box.
  5. Click options. Ensure you have selected what you want for output [likely classification plots, Hosmer-Lemeshow goodness of fit, and CI for Exp(B) at 95%].
  6. Click OK.

SPSS in data view. Analyze, Regression, Binary Logistic has been selected. A multiple logistic regression with two categorical variables has been created.

Interpreting the output

Running the above steps will generate the following output: the Omnibus Tests of Model Coefficients table, the Model Summary table, the Hosmer and Lemeshow Test table, and the Variables in the Equation table.

Before looking at the Variables in the Equation table (the regression proper), we first look at the other three listed tables. We expect the Step 1: Model line in the omnibus table to be statistically significant (p < .05), indicating our model (i.e., the predictor variable or variables we have chosen) do a good job predicting the outcome variable. The pseudo R2 values (Cox & Snell and / or Nagelkerke) explain approximately how much of the variance in your outcome variable is explained by the model (predictor variables) you have selected: here we see fairly high values of 26.2% - 35.4%. And we expect the Hosmer-Lemeshow goodness of fit test to be non-significant (p > .05), indicating good model fit.

Logistic regression output, showing a significant model and 26.2 - 35.4% of outcome variable variance explained.

If your Step 1: Model p-value is < .05, you can move on to interpret your regression using the variables in the equation table. Here, we have ONE line for each continuous predictor, and n-1 lines for each categorical predictor (where n is the number of groups in that variable).

Logistic regression output, showing a significant effect of sex and a trending effect of age.

Important to note, beta (“B” column) units is log odds. For logistic regression, we use the Exp(B) column (as this reports the odds ratios) and the Sig. column (as this reports the p-values) to interpret the impact of the predictor variable(s) on the outcome variable.

  1. On average, and accounting for sex, children were 1.584 times more likely to survive the Titanic than adults (p = .061). This can also be written as a percentage; on average, children were 58.4% more likely to survive the Titanic than adults (p = .061).
  2. On average, and accounting for age, men were 0.086 times more likely to survive the Titanic than women (p = 0.0000000000000002). This can (and should) be flipped for interpretability; women were 11.6 times more likely to survive the Titanic than men. This can also be written as a percentage; on average, and accounting for age, women were 1060% more likely to survive the Titanic than men (p = .0000000000000002).

Ordinal logistic regression

Ordinal logistic regression is used when we want to make predictions about an ordinal dependent variable (also called an outcome variable) based on one or more independent variables (also called predictor variables). This requires a single categorical (3 or more groups with inherent order; e.g., small / medium / large or strongly disagree to strongly agree) outcome variable, and one (simple ordinal logistic regression) or more (multiple ordinal logistic regression) predictor variables.

In this example, we investigate the relationship between the log-transformed weight of an animal’s brain and whether they would be a low, medium, or high-category sleeper.

Ordinal outcome variable

Ordinal logistic regression requires an ordinal outcome variable (e.g., the data must be categorical, with an inherent rank or order to the groups). Here, we will use animals’ time spent sleeping as our outcome variable: we can look at the “sleep_total” variable and see that it is continuous. To use this variable in an ordinal logistic regression, we can categorize the animals’ sleep into “low”, “medium”, and “high” groups:

The msleep dataset open in SPSS Data View, showing multiple species and their total sleep scores.

Expert tip: create two versions of the categorical version of sleep, using ordered numbers (e.g., 1, 2, 3) to denote low, medium, and high sleep categories. The order of the groups here matters (e.g., you can set this to low then medium then high or high then medium then low, but you cannot use medium then low then high as the order would be broken). In Variable View, ensure the “Measure” column is set to Scale data type for one of the categorical versions of sleep (sleep_numeric, for assumption checking purposes), and the other is set to Ordinal (sleep_cat, for conducting the analysis):

Three rows in SPSS's Variable View, showing the msleep dataset's total sleep variable, and two categorical versions of the total sleep variable.

We will use sleep_cat (i.e., the ordinal version of sleep_total) for the ordinal logistic regression, which is an ordinal outcome variable, so we have passed this assumption.

A note about the predictor variable

We can check our predictor variable (brain weight) and plot it using our ordinal outcome variable (sleep_cat) using Graphs > Chart Builder. Select “Boxplot” from the bottom-left, and drag the first graph option to the blue text in the top middle section of the screen. Take your continuous predictor variable and place it on the y-axis, take your categorical outcome variable and place it on the x-axis. Then press OK. The Chart Builder dialogue box should look something like this:

The Chart Builder in SPSS, showing a boxplot with the msleep dataset's brain weight on the y-axis and the categorical sleep variable on the x-axis.

The resulting boxplot (see below) shows some values that are really far away from the rest of the data; the data points with a star indicate that these rows of our dataset are considered extreme outliers if we use this scale.

An SPSS boxplot, with the msleep dataset's brain weight on the y-axis and the categorical sleep variable on the x-axis. There are multiple outliers identified.

Maybe the scale we are using isn’t the best choice! Let’s try log-transforming the brain weight variable using the ln() function in Transform > Compute Variable. Write your new variable name in the “Target Variable” box, and include your formula for calculating the log of brain weight in the “Numeric Expression” box. Click OK.

The Compute Variable command in SPSS, showing the calculation for log-transforming the brain weight variable from the msleep dataset.

When we create the boxplot again (Graphs > Chart Builder) with the log-transformed variable, we see a log scale does a much better job displaying our results; the boxplot is no longer squished, and there do not appear to be any obvious outliers. To keep all of our data in the analysis, let’s proceed using the log-transformed brain weight for this analysis.

An SPSS boxplot, with the msleep dataset's log-transformed brain weight on the y-axis and the categorical sleep variable on the x-axis. There are no outliers identified.

Independence of observations

Independence of observations means that each observation must be entirely independent of the others. To assess independence, we can look at our data; we can see that each observation is a unique species of animal (i.e., each observation is independent of the others), thereby passing the assumption of independence of observations.

The msleep dataset open in SPSS Data View, showing multiple species and their total sleep scores.

Proportional odds or parallel lines

The assumption of proportional odds (sometimes called the test of parallel lines), briefly, assumes that the effect of each predictor variable must be identical at each partition of the data. In other words, the ‘slope’ value for a predictor variable within an ordinal logistic regression model must not change across the different categorized levels of the outcome variable. We can easily check this assumption when we run the regression model in SPSS (see “Interpreting the Output” section, below).

Multivariable ordinal logistic regression: Testing assumptions

In the following example, we want to add an additional predictor variable (body weight) to the previous model (predictor: log brain size; outcome: categorized sleep). Body weight is in the “bodywt” column, and is a continuous variable.

Multicollinearity

When producing an ordinal logistic regression model with multiple predictor (independent) variables, there is an additional assumption that must be met prior to analyzing the model’s output: multicollinearity. Multicollinearity is the assumption that the predictor variables within a multivariable model are predicting a sufficiently different aspect of the outcome (dependent) variable, so that we are not including variables accounting for the same variability. Essentially, this assumption checks that each predictor in the model is actually explaining / accounting for unique variance within the model.

To measure the amount of multicollinearity that exists between two predictor variables, we can use the Variable Inflation Factor (VIF). As VIF values increase, the likelihood of multicollinearity being present within the model increases. VIF values < 3 indicates very low correlation between predictor variables, while VIF values between 3 – 8 indicates some correlation (and a potential risk of multicollinearity), and VIF values > 8 – 10 indicates high correlation (and likely multicollinearity).

Next, let’s check the VIF scores for log_bodywt and log_brainwt before we check whether these can be used to predict sleep. We check multicollinearity like so:

  1. Click on Analyze. Select Regression. Select Linear. (yes, linear; this isn’t a typo!)
  2. Build your model (move your outcome variable to the dependent box, and your two or more predictor variables to the independents box). Note: we want to use the binned (i.e., categorical) version of the outcome variable here that is set to “Scale” as the data type.
  3. Click the Statistics button. Select “Collinearity diagnostics” button.
  4. Click Continue. Click OK.

The Linear Regression command in SPSS, showing the set-up for checking multicollinearity for a multiple ordinal logistic regression.

Looking at the Coefficients table, in the VIF column we see very high (>10) VIF scores; this makes sense, as animals with larger bodies often have larger brains. Based on the cutoff values specified above, this indicates that we have failed the assumption of multicollinearity; in order to appropriately run the regression, we would need to remove one of these two predictor variables. For the purposes of demonstration, we will continue with the analysis, pretending we have passed the assumption of multicollinearity.

The output of the linear regression command in SPSS, showing the VIF scores (used to assess multicollinearity) for the multiple ordinal logistic regression.

How to run a simple ordinal logistic regression

If you have passed all of your assumptions (note: we still need to check the test of parallel lines), you can move on to running the simple ordinal logistic regression.

  1. Click on Analyze. Select Regression. Select Ordinal.

SPSS in Data View. The "Analyze" tab is expanded, with Regression > Ordinal selected.

  1. Place the categorical (3+ levels) outcome variable in the “Dependent” box.
  2. Place your singular predictor variable in the “Factor(s)” box if it is categorical, or in the “Covariate(s)” box if it is continuous.
  3. Click the “Output” button; ensure “Test of parallel lines” is selected.

The Ordinal Regression command in SPSS. The categorical outcome variable is in the "Dependent" box, the simple log-transformed continuous predictor variable is in the "Covariate(s)" box, and the Test of parallel lines has been selected in the Output screen.

  1. Click Continue. Click OK.

Interpreting the output

Running the above steps will generate the following output: Case Processing Summary, Model Fitting Information, Goodness-of-Fit, Pseudo R-Square, Parameter Estimates, and Test of Parallel Lines.

Before looking at the Parameter Estimates table (the regression proper), we first look at the other tables. Let’s start with the Test of Parallel Lines table (i.e., the assumption of proportional odds), which is at the very bottom of the output:

The test of parallel lines output from the SPSS ordinal logistic regression command, showing a non-significant (p > .05) value, indicating the assumption has passed.

Here, a non-significant (p > .05) value in the “Sig.” column indicates we have passed the assumption of proportional odds. If this was statistically significant (p < .05), this would indicate we have failed the assumption. If we fail this assumption, we should not use or interpret the results of the ordinal logistic regression.

Next, we can quickly look at the other tables:

SPSS ordinal logistic regression output, showing: a significant model fitting value (p < .05) indicating the model fits the data, a non-significant (p > .05) goodness-of-fit test, and a 12.6 - 26.6% pseudo r-squared value.

We see a Warning, indicating we have several combinations with missing frequencies (i.e., not every log brain weight value is represented in each sleep category). Next, the Case Processing Summary table shows us the breakdown of our data; here we can see some missing data, as some animals did not have a reported brain weight (and were excluded from the analysis). The Model Fitting Information table shows us whether the model is doing a good job (p < .05 in the “Sig.” column) or not (p > .05 in the “Sig.” column) in fitting the data; here we require a significant value (p < .05) in order to interpret the regression. In the Goodness-of-Fit table, however, we expect non-significant values (p > .05) in order to interpret the regression. The Pseudo R-Square table give us three different versions of approximately how much variance in the outcome variable is explained by the model you have built (i.e., the predictor variables); here, we see that log brain weight explains ~12.6 – 26.6% of the variance of sleep category.

After investigating all of the other tables, we can read the results of the simple ordinal logistic regression in the Parameter Estimates table:

SPSS ordinal logistic regression output, showing a significant log brain weight value (p < .05) for the regression.

We generally ignore any “Threshold” rows, and instead look at the “Location” row(s). For continuous predictors, we should have one row per predictor; for categorical predictors, we should have one row for each category, with the “blank” row serving as the reference variable.

The “Sig.” value is the p-value indicating whether the predictor variable is statistically significant (p < .05) in its ability to predict the value of the outcome variable. The “Estimate” value is the LOG ODDS change in the relationship, or slope, produced by the specific coefficient accounting for other variables in the model. As log odds are unintuitive, we generally exponentiate them into odds ratios for reporting / interpretation. When we exponentiate the Estimate (log odds) value of -.422, we get an odds ratio value of 0.656. We use odds ratios (in combination with p-values) to interpret the impact of the predictor variables(s) on the outcome variable.

Here, we see the odds ratio (e.g., the exponentiated log odds) for the log-transformed brain weight predictor variable is .656. This variable is less than 1, meaning we can find the percentage decrease in the odds like so: [(1 - .655) * 100] = 34.4%. As our variable was statistically significant, we can interpret the regression as follows: for each one-unit increase in the log-transformed brain weight variable (e.g., the predictor variable), the odds of being in a higher sleep category (e.g., the outcome variable) decreased by 34.4%, on average. To say this in plain-er English, animals with higher log brain weights, on average, needed more sleep than animals with lower log brain weights.

Expert tip: if your odds ratio is greater than 1, you would instead see a percentage increase in the odds of being in a different category of the outcome variable.

To report this result, we would say something along the lines of: On average, animals with a larger log-transformed brain weight were more likely (34.4%) to be in a lower sleep category (p < .001).

How to run a multiple ordinal logistic regression

If you have passed all of your assumptions (note: we still need to check the test of parallel lines), you can move on to running the multiple ordinal logistic regression.

  1. Click on Analyze. Select Regression. Select Ordinal.

SPSS in Data View. The "Analyze" tab is expanded, with Regression > Ordinal selected.

  1. Place the categorical (3+ levels) outcome variable in the “Dependent” box.
  2. Place your singular predictor variable in the “Factor(s)” box if it is categorical, or in the “Covariate(s)” box if it is continuous.
  3. Click the “Output” button; ensure “Test of parallel lines” is selected.

The Ordinal Regression command in SPSS. The categorical outcome variable is in the "Dependent" box, the two log-transformed continuous predictor variables are in the "Covariate(s)" box, and the Test of parallel lines has been selected in the Output screen.

  1. Click Continue. Click OK.

Interpreting the output

Running the above steps will generate the following output: Case Processing Summary, Model Fitting Information, Goodness-of-Fit, Pseudo R-Square, Parameter Estimates, and Test of Parallel Lines.

Before looking at the Parameter Estimates table (the regression proper), we first look at the other tables. Let’s start with the Test of Parallel Lines table (i.e., the assumption of proportional odds), which is at the very bottom of the output:

The test of parallel lines output from the SPSS ordinal logistic regression command, showing a non-significant (p > .05) value, indicating the assumption has passed.

Here, a non-significant (p > .05) value in the “Sig.” column indicates we have passed the assumption of proportional odds. If this was statistically significant (p < .05), this would indicate we have failed the assumption. If we fail this assumption, we should not use or interpret the results of the ordinal logistic regression.

Next, we can quickly look at the other tables:

SPSS ordinal logistic regression output, showing: a significant model fitting value (p < .05) indicating the model fits the data, a non-significant (p > .05) goodness-of-fit test, and a 13.1 - 27.7% pseudo r-squared value.

We see a Warning, indicating we have several combinations with missing frequencies (i.e., not every log brain weight value and / or log body weight value is represented in each sleep category). Next, the Case Processing Summary table shows us the breakdown of our data; here we can see some missing data, as some animals did not have a reported brain weight and / or a reported body weight (and were excluded from the analysis). The Model Fitting Information table shows us whether the model is doing a good job (p < .05 in the “Sig.” column) or not (p > .05 in the “Sig.” column) in fitting the data; here we require a significant value (p < .05) in order to interpret the regression. In the Goodness-of-Fit table, however, we expect non-significant values (p > .05) in order to interpret the regression. The Pseudo R-Square table give us three different versions of approximately how much variance in the outcome variable is explained by the model you have built (i.e., the predictor variables); here, we see that log brain weight explains ~13.1 – 27.7% of the variance of sleep category.

After investigating all of the other tables, we can read the results of the multiple ordinal logistic regression in the Parameter Estimates table:

SPSS ordinal logistic regression output, showing a non-significant log brain weight value (p > .05) and a non-significant log body weight value (p > .05) for the regression.

We generally ignore any “Threshold” rows, and instead look at the “Location” row(s). For continuous predictors, we should have one row per predictor; for categorical predictors, we should have one row for each category, with the “blank” row serving as the reference variable.

The “Sig.” value is the p-value indicating whether the predictor variable is statistically significant (p < .05) in its ability to predict the value of the outcome variable. The “Estimate” value is the LOG ODDS change in the relationship, or slope, produced by the specific coefficient accounting for other variables in the model. As log odds are unintuitive, we generally exponentiate them into odds ratios for reporting / interpretation. When we exponentiate the Estimate (log odds) values of -.105 and -.267, we get odds ratio values of 0.899 and 0.766. We use odds ratios (in combination with p-values) to interpret the impact of the predictor variables(s) on the outcome variable.

Here, we see the odds ratio (e.g., the exponentiated log odds) for the log-transformed brain weight predictor variable is .899. This variable is less than 1, meaning we can find the percentage decrease in the odds like so: [(1 - .899) * 100] = 10.1%. As our variable was statistically significant, we can interpret the regression as follows: for each one-unit increase in the log-transformed brain weight variable (e.g., the predictor variable), the odds of being in a higher sleep category (e.g., the outcome variable) decreased by 10.1%, on average. Similarly, the body weight predictor variable had a 23.4% percentage decrease in the odds. As our variables were not statistically significant, we would not interpret these results further.

Expert tip: if your odds ratio is greater than 1, you would instead see a percentage increase in the odds of being in a different category of the outcome variable.

To report this result, if it was statistically significant, we would say something along the lines of: On average, and accounting for the log-transformed body weight, animals with a larger log-transformed brain weight were more likely (10.1%) to be in a lower sleep category (p = .793).

Suggest an edit to this guide

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.