In the last few months since covid-19 has struck, everyone is suddenly talking about supply chains.

Just a few months ago, this was sort of a “boring” topic, only discussed by import specialists and corporate operations managers.

But now, here we are, wondering where our toilet paper is going to come from.

I’m being a little playful here, but it’s obviously serious, and I suppose that’s why everyone is talking about it.

Everyone, almost everywhere, is starting to rethink how they will get critical supplies.

This has me personally thinking about these things. And because I live in the great state of Texas in the United States, I’ve been thinking about Texas in particular.

Thinking about Texas,
thinking about data

I’m personally very bullish on Texas.

There’s news right now that Tesla might move it’s headquarters to Texas (Musk said “Texas or Nevada), and Austin, Texas is now one of the two finalists for where to put the new Tesla Gigafactory.

Ultimately, I think that Texas will be one of the next business and technology hotspots of the next couple of decades.

Here’s a quick Twitter thread that explains some of the details …

To put it simply: Texas is geographically well positioned to take advantage of new shifts in trade.

Thinking about this started me thinking about some of the details … the details about Texas, supply chains, trade infrastructure, and geography.

Mapping Texas ports with R

As it turns out, as I was beginning to think about Texas and supply chains, I was also playing with some new techniques in R.

Specifically, I’ve been playing with creating maps with the sf package.

I’ve made some maps in the past, but to some extent, it was a little hard. Creating maps with R was challenging for few reasons.

But the new sf package makes things really fairly easy.

All that being said, I realized that this would be a good opportunity to create a map related to Texas supply chain infrastructure.

We’ll create a map of 13 Texas ports

In this tutorial and the next tutorial, we’ll create a map of 13 Texas ports.

I’m going to split this project up into two parts:

  • Here in part 1, we’ll just create a rough draft of our map. Really, we just want to get the basic code working, and get a rough draft of our map.
  • In the next tutorial, part 2, we’ll polish the map to make it look good.

Let’s get started with part 1.

Import packages

First, let’s just import some packages.

We’ll be using the tidyverse package, which will load both dplyr and ggplot2. We need dplyr for basic dataframe manipulation tools, as well as ggplot data visualization tools.

The sf and ggspatial packages will give us some specific mapping tools.

We’ll use rnaturalearth and maps for some map data.

And we’ll need tidygeocoder to get the lat/long coordinates for the ports that we want to plot.

# IMPORT PACKAGES

library(tidyverse)
library(sf)
library(ggspatial)
library(rnaturalearth)
library(tidygeocoder)
library(maps)

Now that we have the packages, let’s get some data.

Get map data

Here, we’ll get the data for our maps.

There are a few different ways to get map data in R, but here we’ll get the data for a worldwide country map with ne_countries() and we’ll get the data for a US state map with map().

# GET MAP DATA

world_map_data <- ne_countries(scale = "medium", returnclass = "sf")
state_map_data <- map('state', fill = TRUE, plot = FALSE) %>% st_as_sf()

Notice that in both cases, we’re getting this data as an sf object, which is like a special kind of dataframe that includes geospatial data.

Create preliminary plot

Let’s just start veeeeery simple and just plot one of the datasets.

Plot world data

Here, we’ll plot world_map_data to create a map of the world.

To do this, we’ll use ggplot, but we’re going to use the geom_sf() function to actually plot this specific data.

ggplot() +
  geom_sf(data = world_map_data)

OUT:

An image of a simple map of the world, made with ggplot2 in R.

This is a very simple map.

But notice something important: we didn’t have to specify any “aesthetic mappings.” We didn’t need to tell ggplot anything about the long and lat coordinates. Nothing about how to plot polygons (which was a requirement in the past).

We just tell ggplot to use geom_sf(), and pass an sf object to the data parameter. From there, if the sf object is structured correctly, ggplot knows that it’s geospatial data and will plot it accordingly.

This makes working with geospatial data dramatically easier.

Zoom in on Texas Coast

Next, we’ll zoom in on the Gulf Coast around Texas.

To do this, we’ll use the coord_sf() function to set the limits on latitude and longitude for the plotting window. Notice that we’re doing that with the xlim and ylim parameters.

ggplot() +
  geom_sf(data = world_map_data) +
  geom_sf(data = state_map_data) +
  coord_sf(xlim = c(-100, -91), ylim = c(25,33))

OUT:

An image of a simple map of the Texas/Louisiana coast made in R with ggplot2.

Frankly, this is not bad. It’s simple, but it’s starting to look like something.

A few notes:

Here we used geom_sf() a second time to add a new layer. Specifically, we added the state borders from state_map_data, so we can see the border between Texas and Louisiana.

We also zoomed in on this particular region with coord_sf(). Zooming and filtering your data is very important for data analysis, so you eventually need to know how to do this.

Add ports

The next part is a little harder, so we’ll do it in steps.

Get port names

Here, I’m just creating a vector of port names (strings).

The vector is called portlist.


portlist = c('Brownsville, Texas'
            ,'Port Isabel, Texas'
            ,'Port Mansfield, Texas'
            ,'Corpus Christi, Texas'
            ,'Port Lavaca, Texas'
            ,'Port Freeport, Texas'
            ,'Texas City, Texas'
            ,'Port Galveston, Texas'
            ,'Port Houston, Texas'
            ,'Port Sabine Pass, Texas'
            ,'Port Arthur, Texas'
            ,'Port Beaumont, Texas'
            ,'Port of Orange, Texas'
            )

And we need to transform this into a tibble.

port_data = tibble(location = portlist)

The new tibble is called port_data, and it has the location names of the ports in a variable called location.

Add lat/long variables

Now, we’ll add a lat and long variable to the tibble.

#---------------------------------
# CREATE EMPTY LAT, LONG VARIABLES
#---------------------------------
port_data %>% 
  mutate(lat = NA
         ,long = NA
         ) ->
  port_data
  

To do this, we’re just using the mutate function from dplyr.

Let’s take a quick look at it:

#inspect
head(port_data)

OUT:

# A tibble: 6 x 3
  location                lat  long
  [chr]                 [lgl] [lgl]
1 Brownsville, Texas    NA    NA   
2 Port Isabel, Texas    NA    NA   
3 Port Mansfield, Texas NA    NA   
4 Corpus Christi, Texas NA    NA   
5 Port Lavaca, Texas    NA    NA   
6 Port Freeport, Texas  NA    NA  

Now we have a dataframe (i.e., tibble) with the location names, lat, and long.

Get port coordinates

Next, we’ll get the coordinates for the port locations.

Ultimately, we’ll do this with tidygeocoder::geo_osm() function, but to get the individual coordinates for specific locations, we need to do this one at a time in a for loop.

#------------------
# GEOCODE LOCATIONS
#------------------
for(i in 1:nrow(port_data)){
  coordinates = geo_osm(port_data$location[i])
  port_data$long[i] = coordinates$long
  port_data$lat[i] = coordinates$lat
}

Frankly, I really dislike for-loops in R, so I’m not going to comment.

We should, however, take a look at the data.

#inspect
head(port_data)

OUT:

# A tibble: 6 x 3
  location                lat  long
  [chr]                 [dbl] [dbl]
1 Brownsville, Texas     25.9 -97.5
2 Port Isabel, Texas     26.1 -97.2
3 Port Mansfield, Texas  26.6 -97.4
4 Corpus Christi, Texas  27.7 -97.4
5 Port Lavaca, Texas     28.6 -96.6
6 Port Freeport, Texas   28.9 -95.3

Alright.

We have our ports with the lat and long coordinates. We’re ready.

Plot rough draft of Texas ports

Let’s plot the data.

ggplot() +
  geom_sf(data = world_map_data) +
  geom_sf(data = state_map_data) +
  geom_point(data = port_data, aes(x = long, y = lat), color = 'red') +
  coord_sf(xlim = c(-100, -92), ylim = c(25,33))

OUT:

A map of the Texas coast, with the locations of 13 different Texas ports plotted with red points.

Not bad.

There’s still a lot that we need to do to improve this.

We need to:

  • Change the color of the land and water
  • Add a legend
  • Add labels
  • Add a title

Etcetera.

There’s more to do, but this is a good rough draft.

We’ll continue this tutorial and polish the map up in part 2.

Sign up to see part 2

Want to see part 2?

Sign up for our email list.

When you sign up for our email list, you’ll get all of our tutorials delivered directly to your inbox.

So you’ll get part 2 of this tutorial, but also other tutorials about data science in R and Python …

When you sign up, you’ll get free tutorials about ggplot2, dplyr, Numpy, Pandas, scikit learn, and more.