 Last week, we created a map of world oil production, by country.

This week, I’ll show you how to make a slight modification. I’ll show you how to highlight specific countries according to a variable in your data frame.

In this code, we will re-use our data from the last tutorial. You can find the code to create the data there.

Before we do any plotting, let’s just inspect the data:

```map.oil %>% glimpse()

# Observations: 99,338
# Variables: 9
# \$ long             -69.89912, -69.89571, -69.94219, -70.00415, -70.06612,...
# \$ lat              12.45200, 12.42300, 12.43853, 12.50049, 12.54697, 12.5...
# \$ group            1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, ...
# \$ order            1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17,...
# \$ region           "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", ...
# \$ subregion        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
# \$ rank             NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
# \$ opec_ind         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
# \$ oil_bbl_per_day  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...

```

So what do we have here?

The data contains variables that enable us to plot countries as polygons: long, lat, region, and group.

The data also contains a variable called oil_bbl_per_day, which is essentially the amount of oil produced by the country per day.

Let’s make a basic map that plots this data, with each country “filled in” according to the amount of oil it produces.

```#=====
# PLOT
#=====

# BASIC (this is a first draft)

ggplot(map.oil, aes( x = long, y = lat, group = group )) +
geom_polygon(aes(fill = oil_bbl_per_day))
``` Next, let’s make a slightly different map. Here, we’re going to remove the mapping to the fill aesthetic, and we’ll going to map a different variable – opec_ind – to the color aesthetic.

```#----------------------------------------------
# PLOT with red highlight around OPEC countries
#----------------------------------------------

ggplot(map.oil, aes( x = long, y = lat, group = group )) +
geom_polygon(aes(color = as.factor(opec_ind))) +
scale_color_manual(values = c('1' = 'red', '0' = NA))

``` Essentially, what we’ve done here, is used the color aesthetic in combination with scale_color_manual() to manipulate the border color of the countries. Specifically, we have just highlighted OPEC countries with the color red.

Now, let’s combine the two techniques: we will fill in the color of the countries using the fill aesthetic, and we will highlight the OPEC countries by mapping a variable to the color aesthetic.

```#-------------------------------------------
# PLOT
# - red highlight for OPEC
# - fill value corresponds to oil production
#-------------------------------------------

ggplot(map.oil, aes( x = long, y = lat, group = group )) +
geom_polygon(aes(color = as.factor(opec_ind), fill = oil_bbl_per_day)) +
scale_color_manual(values = c('1' = 'red', '0' = NA))

``` There’s more that we’ll need to do to create the finalized version, but all things considered, this is pretty good. It essentially shows the information we want to display … it just needs some formatting.

So, now lets create the final, formatted map.

```#=====================
# FINAL, FORMATTED MAP
#=====================

ggplot(map.oil, aes( x = long, y = lat, group = group )) +
geom_polygon(aes(fill = oil_bbl_per_day, color = as.factor(opec_ind))) +
,values = scales::rescale(c(100,96581,822675,3190373,10000000))
,labels = comma
,breaks = c(100,96581,822675,3190373,10000000)
) +
guides(fill = guide_legend(reverse = T)) +
labs(fill = 'Barrels per day\n2016'
,color = 'OPEC Countries'
,title = 'OPEC countries produce roughly 44% of world oil'
,x = NULL
,y = NULL) +
theme(text = element_text(family = 'Gill Sans', color = '#EEEEEE')
,plot.title = element_text(size = 28)
,plot.subtitle = element_text(size = 14)
,axis.ticks = element_blank()
,axis.text = element_blank()
,panel.grid = element_blank()
,panel.background = element_rect(fill = '#333333')
,plot.background = element_rect(fill = '#333333')
,legend.position = c(.18,.36)
,legend.background = element_blank()
,legend.key = element_blank()
) +
annotate(geom = 'text'
,label = 'Source: U.S. Energy Information Administration\nhttps://en.wikipedia.org/wiki/List_of_countries_by_oil_production\nhttps://en.wikipedia.org/wiki/OPEC'
,x = 18, y = -55
,size = 3
,family = 'Gill Sans'
,color = '#CCCCCC'
,hjust = 'left'
) +
scale_color_manual(values = c('1' = 'orange', '0' = NA), labels = c('1' = 'OPEC'), breaks = c('1'))

``` Let’s point out a few things.

First, the fill color scale has been carefully crafted to optimally show differences between countries.

Second, we are simultaneously using the highlighting technique to highlight the OPEC countries.

Finally, notice that we’re using the title to “tell a story” about the highlighted data.

All told, there is a lot going on in this example.

To create something like this, you need to understand basic ggplot2 syntax, dplyr, visual design, data storytelling, and more.

Although it could take you a very long time to learn how to do this, if you know how to learn and how to practice data science, you could learn to do this within a few months (or faster).

# Sign up now, and discover how to rapidly master data science

It’s possible to learn and master data science tools faster than you thought possible.

Even though the example in this blog post is complicated, it is very easy to learn to create visualizations like this, if you know what tools to learn, how to practice those tools, and how to put those tools together.

Sharp Sight is dedicated to teaching you how to master the tools of data science as quickly as possible. We teach data science, but we also teach you how to learn.

Do you want to know more?