Skip to Main Content

Analyze Data: R and RStudio

Contributors: Lindsay Plater and Michelle Dollois

The interface

The RStudio interface has four main sections: the Source window (top left), the Environment / History / Connections / Tutorial window (top right), the Console / Terminal / Jobs window (bottom left), and the Files / Plots / Packages / Help / Viewer window (bottom right).

A screenshot of a basic (empty) RStudio interface, with the Source Window top left, the Environment Window top right, the Console Window bottom left, and the Files Window bottom right.

The sizes of these windows are adjustable if you drag and drop the edges. If you are missing a window, it may be minimized (click the boxes in the top right of the section to expand / collapse certain windows). If you are missing the source window, you may need to open a file or create a new file by clicking the white sheet with the green plus sign (top left symbol on the toolbar).

The Source Window

The source window, normally located at the top-left of RStudio, is where you will likely do most of your work in RStudio. Here, you can open or write code / scripts, run code / scripts, view data, and more. This window is text-based only. To run code in the source window, highlight the line(s) you wish to run and either click “Run” (top right of the source window) or press CTRL + ENTER.

The Environment

The environment window, normally located at the top-right of RStudio, displays which objects are currently available for computation in your workspace. Datasets and variables you import or generate will be displayed here. To view this data in the source window, you can double-click on it or run the line View(NAMEOFDATA) (where “NAMEOFDATA” is what you named your dataset when you imported it / created it).

This window also includes tabs for history, connections, and tutorial. The history tab keeps a record of all code that has been run this session, included computations in the console. The connections tab is for accessing external databases. The tutorial tab includes more information for learning R / RStudio.

The Console Window

The console window, normally located at the bottom-left of RStudio, is where you can type code that you do not wish to save. For example, if you type 10 + 10 in the console and press ENTER, the console will output “20” as the answer. This is also where the code that you run in the source window might generate output. 

If you see a blue greater than (>) symbol, this means the console is ready and you can run code. If you instead see a blue plus sign (+) and / or if there is no symbol in the console and / or it the top right of the console shows a red stop sign, this indicates that the console is not ready and you cannot run additional code. Either your previous code has broken in some way (you may need to restart your session), or your previous code is still running (you can force it to stop by clicking on the stop sign).

The Files Window

The files window, normally located at the bottom-right of RStudio, shows you the files and folder in the current directory. It’s important to set your working directory properly each session to ensure your code runs as expected (i.e., import, export; for help on setting your working directory, check the Importing Data section).

This window also includes tabs for plots, packages, help, and viewer. The plots tab is where graphs will generate if you run code to generate graphs. The packages tab allows you to see which packages are currently loaded. The help tab links to the R Documentation, and you can search functions, packages, databases, and more for additional support. The viewer tab is for dynamic data visualizations / websites generated with R.

Importing data

In addition to files saved in R format (.R), you can open many kinds of text files in R, including excel text files, comma-delimited text files, and text files from other software packages (e.g., SPSS, SAS, Stata).

Before you import a file, be sure to set your working directory so R / RStudio knows where to look for the file. There are several ways to do this, but one of the simplest is to click the Session tab, click Set Working Directory, click Choose Directory. You can now navigate through your computer folders to the folder where you would like to import / export data, then click “Open”. When you set your working directory, the console will generate the line of code used, likely setwd(“C:/PATH/”) (where “PATH” is the pathway on your computer). Note you may need to change your back-slashes (\) to forward-slashes (/). If you wish to verify your current working directory, run the code getwd().

Opening a dataset (Source Window, recommended)

The recommended way to open a file in RStudio is to write code in the Source Window (top left). This is the recommended way so you may save your code (i.e., this keeps a record of your steps for yourself / others, so you know precisely which file at which location was used). As with most tasks, there are numerous ways to open files in R. You may wish to try these methods:

  • For Excel files, be sure to install and load the “readxl” package. Run the line NAMEIT <- read_excel(“NAMEOFDATA.xlsx”) (where NAMEIT is what you wish to call your data, and NAMEOFDATA is the name of your Excel file).
  • For SPSS files, be sure to install and load the “haven” package. Run the line NAMEIT <- read_sav(“NAMEOFDATA.sav”) (where NAMEIT is what you wish to call your data, and NAMEOFDATA is the name of your SPSS file).
  • For SAS files, be sure to install and load the “haven” package. Run the line NAMEIT <- read_sas(“NAMEOFDATA.sas7bdat”) (where NAMEIT is what you wish to call your data, and NAMEOFDATA is the name of your SAS file).
  • For Stata files, be sure to install and load the “foreign” package. Run the line NAMEIT <- read_dta(“NAMEOFDATA.dta”) (where NAMEIT is what you wish to call your data, and NAMEOFDATA is the name of your Stata file).
  • For CSV files, you can use a base R function (note: for large datasets, consider using readr or data.table packages). Run the line NAMEIT <- read.csv(“NAMEOFDATA.csv”) (where NAMEIT is what you wish to call your data, and NAMEOFDATA is the name of your CSV file).

Opening a dataset (“File” button, not recommended)

One basic way to open a file in RStudio is to use the tabs on the ribbon. You may do so as follows:

  1. Click on File (top left button). Select Open File.
  2. To view all files, ensure “All Files” is listed in the drop-down menu.
  3. In the “Open File” dialog box, select the file you want to open.
  4. Click Open.

Opening a dataset (“Files” Window, not recommended)

Another way to open a file in RStudio is to use the Files Window (bottom right). You may do so as follows:

  1. Click on the “Files” tab in the Files Window.
  2. Navigate through the folders to your desired file. Select the file you want to open, and click “Import Dataset”.
  3. Ensure the Data Preview looks as expected, then click “Import”.

Installing packages

Often you may wish to use functions written by others. The first step to do this is to install the package you wish to use; you must install the package the first time you wish to use the package. To install a package, run the line install.packages(“NAMEOFPACKAGE”) (where “NAMEOFPACKAGE” is the package you wish to install, e.g., “tidyverse”). The next step is to load the package you wish to use in the current session; you must load the package each time you start a new session if you wish to use the package. To load a package, run the line library(NAMEOFPACKAGE) (where “NAMEOFPACKAGE” is the package you wish to load, e.g., “tidyverse”).

A screenshot of an RStudio interface, with several commented lines (green text) and two lines of code in the Source Window: install.packages("tidyverse") and library(tidyverse).  The Console Window has output indicating that the library(tidyverse) line was run successfully.

Computing variables

It’s easy to create new variables in R. If you already have a dataframe (i.e., if you’ve already imported a file), you can add a new column using the assignment NAMEIT$NEWCOLUMN <- CODE (where NAMEIT is the name of the dataframe you imported, NEWCOLUMN is the name of the column you wish to create, and CODE is the code you would like to run. You can use this to compute values based on numeric transformations of other variables:

  • add, subtract, divide, multiply, or square the values in one or more columns
  • convert measurements (e.g., weight from pounds to kilograms)

As an example, in the code below the “dataset.csv” file has been opened. You can see this stored as “df” in the global environment (top right). A new column has been created in this dataframe, called “mean_fakedata”, which is the average of three columns of data in the df dataframe.

A screenshot of an RStudio interface, with several commented lines (green text) and several black lines of code in the Source Window. A file has been imported, and a new column has been added.  The Console Window has blue output indicating that the code has been run successfully.

We can view the new column by double-clicking on “df” in the global environment, or by typing the code View(df).

A screenshot of an RStudio interface, with the dataframe open to see the data in a speadsheet format. A new column (mean_fakedata) has been computed.

Saving data

IMPORTANT NOTE: many of the changes you make to the dataframe will ONLY change the dataframe, not the original file. If you want a copy of the final file (i.e., with any changes you’ve done in R), you will need to save it to your computer. This is simple to do with write.csv(DATAFRAMENAME, “PATH/FILENAME.csv”), where DATAFRAMENAME is what you have called your dataframe, PATH is the pathway on your computer, and FILENAME is what you would like the file to be called. Saving the file as a .csv means it is easy to open in other software. In this screenshot, PATH was not specified because the working directory was already set.

A screenshot of the RStudio interface side-by-side with the open NEWdataset.csv file. The write.csv() function was used to save the changes as a new file.

Suggest an edit to this guide

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.