A few weeks ago, I wrote an article saying that you should master R. The basic argument, is that if you want to actually work as a data scientist, you need to know the essential tools backwards and forwards.
In response, a reader left a comment. I have to say that it’s unfortunate to read something like this, but sadly it’s very common.
Essentially, he wrote that he took an online course, but still can’t write code:
I was able to take a data analysis course on edX with no problem, following all the instruction and guides they provide, acing the course, but as you noted astutely, not learning or remembering much of it, because I had mostly cut and pasted the code. I tried to do some elementary analysis recently, after almost a year later, and was not able to even do that”
Does this sound like you?
If you’re an aspiring data scientist, and you’re taking online courses and reading books, but not getting results, you need to regroup.
How many online courses have you taken?
How many data science books have you bought?
After taking courses and buying books, can you write R code rapidly and from memory?
… or instead, are you doing Google searches to find out how to execute basic data science techniques in R?
Which one are you? Are you fluent in R, or are you constantly struggling to remember the essentials?
If you’re still struggling to fluently write code, keep reading.
Are you fluent in R?
– Able to speak a language accurately, rapidly, and confidently – in a flowing way.
In casual use, “fluency” refers to language proficiency broadly, while in narrow use it refers to speaking a language flowingly, rather than haltingly.
I find it a little odd that the concept of “fluency” is used so rarely among programmers and technologists. Honestly, this idea of “fluency” almost perfectly encapsulates the skill level you need in order to achieve your goals.
Read the definition. To be fluent, is to be able to execute proficiently, accurately, rapidly, and confidently.
Is that how you write R data science code?
Do you write code proficiently, accurately, rapidly, and confidently? Do you write your code from memory?
Or do you write your code slowly? Laboriously? Can you remember the syntax at all? Are you constantly looking things up?
You need to be honest with yourself, because if you have weaknesses as a data scientist (or data science candidate), the only way you can correct those weaknesses is by being honest about where you need to improve.
The reality is that many data science students are not fluent in the essential techniques.
To be clear, when I say “essential techniques” I’m not talking about advanced techniques, like machine learning, deep learning, etc. If you’re a beginner or intermediate data science student, it’s normal to still struggle with machine learning.
No. I’m talking about the essential, foundational techniques, like data manipulation, data visualization, and data analysis.
Most data science students (and even some practitioners) can’t do these things fluently.
If that sounds like you, you need to rethink your learning strategy. If you can’t fluently execute essential techniques – like visualization, manipulation, and analysis – then you need to revisit those things and focus on them.
To get a data science job, to keep a data science job, and to excel in a data science job, you need to master the foundational data science techniques.
Getting a job as a data scientist without fluency in R (or another data language) is like trying to get a job as a writer for a Spanish magazine without having basic fluency in Spanish.
Don’t kid yourself.
Your first milestone: fluency with the essential techniques
Your real first milestone as an aspiring data scientist is achieving basic fluency in writing R code. More specifically, you need to be fluent in writing R code to perform data visualization, data manipulation, and data analysis. These are the foundations, and you need to be able to execute them proficiently, rapidly, from memory.
If you can’t do visualization, manipulation, and analysis rapidly and from memory then you’re probably not ready to do real data science work. Which means, you’re not ready to apply for a data science job.
Your first milestone: fluency in the essentials.
Let’s break that down more. Here are some of the things you should be able to execute “with your eyes closed”:
- Data visualization basics
– Bar charts
– Line charts
– Box Plots
– Small multiples (these are rarely used, but very useful)
- Intermediate visualization
– Manipulating colors
– Manipulating size (I.e., bubble charts)
– Dealing with data visualization problems (e.g., over plotting)
– Formatting plots
- Data Manipulation:
– How to read in a dataset (from a file, or inline)
– How to add variables to a dataset
– How to remove variables from a dataset
– How to aggregate data
… I could go on.
This is a brief (and incomplete) list of things that you should be able to execute without even thinking about it. These are basic, essential tools. If you can’t do them fluently, you need to refocus your efforts until you can.
How to become fluent in R
This sounds great in theory. “Fluent in R!” It sounds good.
But how do you accomplish this?
I’ve said it before, and I’ll repeat:
To become fluent in data science in R, you need to practice the essential techniques. You need to drill these essential techniques until they become second nature.
It’s not enough to just cut-and-paste some code one time and assume that you’ve learned it.
And it’s not enough to watch a few videos. To be clear: videos are an excellent way to learn something new the very first time. Good video lectures can teach you how something works and offer explanations. They are good for giving you basic understanding.
However, learning a technique from a lecture video is not the same thing as practiced, internalized skill. Lots of people watch a video or a lecture and say “yep, that makes sense.” Great.
But they don’t actually practice the technique, so they never internalize it. Actually, what happens is that they “learn” the technique from the video, but forget the technique soon after. They forget because they fail to practice.
Example: learning R is like learning a foreign language
As I’ve already suggested, learning R is much like learning a foreign language, like Spanish.
Let’s use that as an example. Let’s say you’re learning Spanish.
One day, in a lecture, you learn a little piece of grammar. As you learn that piece of grammar in the lecture, you understand it. Because you “learned” it, you’ll be able to use that grammatical construct by simply repeating it. You’ll also likely be able to use it for a few minutes or hours after class (although, you’re likely to struggle a little bit).
Next, you leave the classroom and don’t practice that grammatical construct.
A week later, do you think you’d still be able to use it? Would you remember it? How “fluent” will you be with that piece of grammar?
Here’s my bet: if you don’t practice that piece of grammar, you will forget it.
Foreign language vocabulary is the same. To remember a vocabulary word, it’s not enough to learn the word a single time. I bet you’ve had that experience. You learn the Spanish word for “cat,” and you can remember it for a few minutes, but if you don’t practice it, you will for forget it.
In foreign language, if you learn grammar and words, and you want to remember them in the long run, you need to practice them. You need to drill. The best way to remember a word, is to learn it, and then practice it repeatedly over time.
Guess what? Learning a programming language is almost exactly the same.
To learn and remember programming language syntax, you need to practice. You need to drill the basic “vocabulary” and syntax of the programming language until you know it without thinking about it.
If you do this … if you learn the basic syntax, and practice it syntax until you can write it fluidly and quickly … you will achieve fluency.
I will repeat: identify the essential techniques and practice them relentlessly.
How long does it take to achieve basic fluency in R?
You might be asking, how long will this take.
Actually, it depends on how good you are at learning. Learning itself is a technical skill.
If you don’t know how to practice, this could take years. I know people who started learning R years ago, and they still aren’t fluent. They bought dozens of books, but still can’t write code very well because they never really practiced. Again, it’s like foreign languages. I know people who have been studying Spanish for years and they still can’t have conversations.
This is one of your major risks. It might take you years to achieve basic fluency in R.
Even worse: you might fail altogether.
The problem here is that most online courses will not show you how to practice. They might show you syntax and explain how the language works, but they don’t show you how to practice to achieve mastery.
On the other hand, there is some good news …
If you know how to practice programming languages, you could achieve basic fluency as fast as about 6 weeks.
I won’t go into the details here, but if you know how to “hack your memory” you can learn R very, very quickly. Essentially, you need to know exactly how to practice for maximum gains and efficiency.
If you know how to do this, and you practice diligently every day, it’s possible to master the foundations of R within 6-8 weeks. (In fact, it’s probably possible to do faster, if you really hustle.)
To succeed as a data scientist, become fluent in the essentials
I strongly believe that to succeed as a data scientist, you need fluency. You need a rapid, unconscious mastery of the essential syntax and techniques for data science in R.
And that requires practice.
If you want to be a data scientist, here is my recommendation. Learn and drill the major techniques from the following R packages:
ggplot2 dplyr tidyr lubridate stringr forcats readr
These give you essential tools for manipulating, cleaning, wrangling, visualizing and analyzing data.
If you can become fluent with these, you’ll have all of the tools that you need to get things done at an entry level.
You’ll be prepared to work on your own projects.
You’ll be prepared for more advanced topics.
And you’ll be well on your way to becoming a top-performer.
Our data science course opens next week
If you’re interested in rapidly mastering data science, then sign up for our list right now.
Next week, we will re-open registration for our flagship course, Starting Data Science.
Starting Data Science will teach you the essentials of R, including
It will also give you a practice system that you can use to rapidly master the tools from these packages.
If you sign up for our email list, you’ll get an exclusive invitation to join the course when it opens.
SIGN UP NOW