“Master the basics.”
That’s a common mantra here at Sharp Sight.
Loyal readers know what I mean by “master the basics.” To master data science, you need to master the foundational tools.
That means knowing how to create essential plots like:
- Bar charts
- Line charts
And performing data manipulations like:
- Creating new variables
- Joining datasets
After the foundations, the little details matter
Having said that, the little details matter.
After you master the basics, you need to learn the little details.
A great example of this is plot annotation.
Adding little details like plot annotations help you communicate more clearly and “tell a story” with your plots.
Moreover, annotations help your data visualizations “stand on their own.”
What I mean by this, is that they help your plots communicate and “tell a story,” without you being there to do the communicating.
Certainly, as a data scientist, if you create a report or analysis there will be many instances where you will personally present your work to an audience. In these instances, you will be there to explain your work and give a personal “voice over” for your visualizations.
But that’s not always the case. For example, you’ll often send your analyses to partners as reference documents. Other times, you’ll publish them (maybe on a blog, or internal company website). In these instances, you won’t be there to explain your work. You need your work to “speak for you.” The charts need to communicate on their own. To make sure that your work is still effective when you are not there, you’ll often need to annotate your work.
Therefore, annotations are one of the “little details” that you need to learn after you learn the foundations.
A simple annotation in ggplot2
Here, I’ll show you a very simple example of a plot annotation in
Before I show it to you though, I want to introduce you to a learning principle that you can use when you’re learning and practicing data science techniques.
annotate(), start with a very simple example
When you’re learning a new skill, it’s very effective to learn and practice with very simple examples. The simpler the better.
As a side note, this is one of the reasons that I strongly discourage the “jump in and build something” method of learning. When people “jump in” and work on a project, they typically select something that’s far too complicated, so they spend all of their time struggling to do simple things that they should have learned before working on a project.
So before you jump into a project, learn individual techniques. And when you’re learning a new technique, use very, very basic examples.
Code: how to create an annotation in ggplot2
With that in mind, I’ll show you a very, very simple plot annotation in
Here we have a histogram with a dashed line at the mean. The mean is 5.
library(ggplot2) set.seed(10) df.rnorm <- data.frame(rnorm = rnorm(10000, mean = 5)) ggplot(data = df.rnorm, aes(x = rnorm)) + geom_histogram(bins = 50) + geom_vline(xintercept = 5, color = "red", linetype = "dashed")
We want to add an annotation that explicitly calls out the value of the mean.
Add an annotation with the
ggplot(data = df.rnorm, aes(x = rnorm)) + geom_histogram(bins = 50) + geom_vline(xintercept = 5, color = "red", linetype = "dashed") + annotate("text", label = "Mean = 5", x = 4.5, y = 170, color = "white")
Here, we’ve added the annotation that says “Mean = 5.”
This is very straightforward.
To accomplish this, we’ve used the
The first argument of the function is
The next piece of syntax within
Next, we specify the exact location of the annotation by detailing the x and y coordinates with the
Finally, we specify the color. In this case, I’ve set the color to
You could simplify this code even further by removing the color specification and letting it default to
Having said that, in this case, the white text looks best against the dark grey histogram, so I left that code in.
To remember the
annotate() technique, you need to practice it
Nevertheless, my bet is that many people will forget it in the long run because they fail to practice it. They’ll “cut and paste” once or twice, and then quickly forget it.
I’ve said before that it’s not enough just to learn a new technique. You need to remember it in the long run.
The problem is that even if you learn a new ggplot2 technique today, you’re very, very likely to forget within a few days.
Having said that, if you want to remember how to use
Sign up to learn
Discover how to rapidly master
If you sign up, you’ll get free tutorials about