Select Page

## Scatterplot Overview

The scatterplot plots points on grid.

## Uses

Use the scatterplot to visualize the relationship between two quantitative variables.
(see Wikipedia)

## Code: Scatterplot in R

Below, I’m going to show you some simple code to create a scatterplot in R using the ggplot2 package. To do this, you’ll need to have R and ggplot2 installed. If you don’t have R set up and installed, enter your name and email in the sidebar on the right side of the page and we’ll send you a pdf to help you get set up.

Here’s the code to create a simple scatterplot in R.

```# Load ggplot2 package for graphics/plotting
library(ggplot2)

# Create dummy dataset
df.test_data <- data.frame(x_var = 1:50 + rnorm(50,sd=15),
y_var = 1:50 + rnorm(50,sd=2)
)

# Plot data using ggplot2
ggplot(data=df.test_data, aes(x=x_var, y=y_var)) +
geom_point()

```

## Results

And here’s the output you should see (note: your plot will look slightly different. The points will be in different locations because we used rnorm to create random, normally distributed data.)

## Explanation

Don’t be intimidated by this if you’re just starting out. There’s an underlying structure to this code. Once you understand that structure, this will make sense. Moreover, once you understand that structure, data visualization generally will make much more sense.

Let’s take this line by line:

First, we just load the ggplot package (if you’re a beginner, ggplot is a graphics package that we use for data visualization).

```library(ggplot2)
```

Next, we call the ggplot() plotting function:

```ggplot(data=df.test_data, aes(x=x_var, y=y_var))
```

Here, with the data=df.test_data argument, we’re basically telling the ggplot() function that we’re going to plot data from the df.test_data data frame.

Next, there’s an important piece of code that calls the aes() function. Basically, what aes() does, is it maps variables in your data frame to “aesthetic” attributes in the graph.

So, if we were to simplify that expression, one piece at a time, aes(x=x_var) would indicate that we want to plot the variable x_var on the x-axis; more technically, we are mapping the variable x_var to the x-axis. Similarly, we are mapping the variable y_var to the y-axis.

This is how we create plots with ggplot. We just write code to map variables in your data frame to aesthetic features of the plot.

Next, we indicate what sorts of geometric objects we’ll be adding to the plot (called “geoms” in the ggplot syntax).

```  geom_point()
```

What does this do? It just indicates to the ggplot() function that you want to plot “points” (or point geometric objects; AKA point “geoms”).

To be clear, the language might be a little confusing. Again, a “geom” is just a geometric object. Examples of geometric objects are points, lines, and bars.