Select Page

san_francisco_crime_map_2014_ggplot2

When I was working as a data scientist at Apple in Silicon Valley, I’d drive up to San Francisco on nights and weekends to meet a girl for dinner or go to a meetup.

I sort of fell in love with the city, and have recently been checking out datasets on DataSF so I could do some geospatial visualizations of San Francisco.

As it turns out, much like Chicago, and Philadelphia, crime data is readily available. So, I downloaded the crime data and started visualizing in R using ggplot2.

The above is a map is the result. It’s a map of 2014 SF crime data through mid December.

What’s remarkable is that the plotting code (the code that creates the map itself) is only 12 lines of code. And of those 12, the vast majority of the code is just formatting and subtle tweaks to aesthetic features to give it the look I wanted. (The look-and-feel of it was originally inspired by a map of London bike and pollution data at spatial.ly.)


library(ggplot2)

#################################
# GET CRIME DATA AND SF GEO DATA
#################################


#------------------------------------------
# Download the zipped SF crime data (2014)
#  and save it to the working directory
#------------------------------------------
download.file("https://vrzkj25a871bpq7t1ugcgmn9-wpengine.netdna-ssl.com/wp-content/uploads/2014/12/sf_crime_YTD-2014-12_REDUCED.txt.zip", destfile="sf_crime_YTD-2014-12_REDUCED.txt.zip")

#------------------------------
# Unzip the SF crime data file
#------------------------------
unzip("sf_crime_YTD-2014-12_REDUCED.txt.zip")

#------------------------------------
# Read crime data into an R dataframe
#------------------------------------
df.sf_crime <- read.csv("sf_crime_YTD-2014-12_REDUCED.txt")

#------------------------------
# Download water boundaries
#  and neighborhood boundaries
#------------------------------
df.sf_neighborhoods <- read.csv(url("https://vrzkj25a871bpq7t1ugcgmn9-wpengine.netdna-ssl.com/wp-content/uploads/2014/12/sf_neighborhood_boundaries.txt"))
df.sf_water <- read.csv(url("https://vrzkj25a871bpq7t1ugcgmn9-wpengine.netdna-ssl.com/wp-content/uploads/2014/12/sf_water_boundaries.txt"))



################
# PLOT THE DATA
################
ggplot() +
  geom_polygon(data=df.sf_neighborhoods,aes(x=long,y=lat,group=group) ,fill="#404040",colour= "#5A5A5A", lwd=0.05) +
  geom_polygon(data=df.sf_water, aes(x=long, y=lat, group=group),colour= "#708090", fill="#708090") +
  geom_point(data=df.sf_crime, aes(x=df.sf_crime$X, y=df.sf_crime$Y), color="#FFFF3309", fill="#FFFF3309", size=1.3) +
  geom_polygon(data=df.sf_neighborhoods, aes(x=long,y=lat, group=group) ,fill=NA,colour= "#DDDDDD55", lwd=.3) +
  ggtitle("San Francisco Crime (2014)") +
  theme(panel.background = element_rect(fill="#708090")) +
  theme(axis.title = element_blank()) +
  theme(axis.text = element_blank()) +
  theme(axis.ticks = element_blank()) +
  theme(panel.grid = element_blank()) +
  theme(plot.title = element_text(family="Trebuchet MS", size=38, face="bold", hjust=0, color="#777777"))





 
Said differently, creating maps in R using ggplot2 is not that difficult. You just need to understand how ggplot2 works.

As I’ve said before, ggplot2 has a deep syntactical structure. Once you know that structure, seemingly complex visualizations become much, much easier to create. In fact, owing to the deep structure of how ggplot2 works, this map is basically just a sophisticated scatterplot.

To be clear, there is a lot of data-manipulation and prep-work that I didn’t show here.

You also need to be able to build a plot like this iteratively. That is, you need to have a solid understanding of the design process for creating a visualization like this.

But at it’s core, this visualization isn’t as difficult to create as it might seem.

I’ll write up some in-depth material showing you how to make a visualization like this, step-by-step. If you’re interested in learning how to produce something like this, sign up for the email list and I’ll let you know when the in-depth tutorial is available.