A key for mastering data science

A few days ago, I received an email from a Sharp Sight reader. The author of the email is having trouble learning data science in R. He’s taken several data science courses, but still has trouble with critical data science skills.

Here’s an excerpt from his email:

Email exerpt from student who keeps forgetting data science.

I’ve redacted the company names, but I’ll tell you that he took courses by two very well known data science training companies.

Does this sound familiar?

What Miguel is experiencing is unfortunate, but all too common.

I regularly receive emails from people telling me that they’ve taken courses about data science, but still can’t write code. They initially learned the material, but eventually forgot everything.

This is a critical problem on the path to mastering data science. If you don’t practice what you learn, you will forget.

Why you’re failing to master data science skills

The critical problem for students like this isn’t necessarily learning.

Many data science skills are easy enough to understand. dplyr is fairly easy to understand. Although ggplot2 has a few quirks, it is also relatively easy to understand. You can watch a few videos and learn how these tools work. It’s pretty easy to type in some code a few times and get these tools to work on your own computer. These tools are easy to learn.

Moreover, there’s plenty of material out there to help you learn. Although the quality definitely varies, you can find information about almost anything on the internet. Even here at Sharp Sight, we give away tons of free tutorials to help you learn data science.

There’s plenty of tutorials to help you learn, and many techniques (especially the “basics) are easy enough to learn.

But they aren’t necessarily easy to remember. That’s because the human mind naturally forgets.

Have you ever had the experience of asking some their name, only to forget it 5 minutes later? Have you ever learned a fact from a book, but then forgotten it the next day?

This is natural. It’s completely normal. The human brain naturally forgets.

So on the path to mastering data science skills, the roadblock typically isn’t learning so much as the problem is forgetting.

Forgetting what you learn is stopping you from mastering data science.

You need to master data science, not just “learn” it

At this point, I want to re-emphasize the importance of mastery.

Here at Sharp Sight, we routinely stress the importance of mastery. You need to master data science skills. It’s not enough just to “sort of know them.”

This is particularly true of basic syntax. Like it or not, R syntax (or the syntax of another data science language like Python) gives you the tools to “get things done.” You need syntax to create deliverables and ultimately create business value for your clients.

If you get hired, it’s not good enough to just have a vague memory of the syntax. It’s not good enough to be a cut-and-paste coder who goes to Google every 5 minutes to look up how to write a simple piece of syntax.

To work as a data scientist, you need to be relatively “fluent” in writing data science syntax. If you work in R – like we often do here at Sharp Sight – that means that you need to be relatively fluent in writing R code. You need to have a strong working command of the syntax. This means that you should be able to write the code for basic tools rapidly and from memory.

Mastery isn’t only important for “getting things done.” Mastery of basic syntax is also important for getting hired. To get hired as a data scientist, you need to be somewhat fluent. At the very least, you’ll need to be fluent in the “core” skills.

What do you think will happen if you walk into a data science interview and someone asks you to write the code for a scatterplot …

Do you think that it’s enough to tell them, “yeah, I watched a video on that once and typed the code a few times, but I don’t really remember how to do it”?

Uh … you won’t get the job.

To get a job and to be effective in a job, you will need to master “core” data science skills. You will need to have a strong fluency in the syntax for data manipulation, data visualization, and data analysis.

If you want a data science job, and perform well in that job, mastering the skill set is critical.

To master data science, you need to stop forgetting

To achieve this level of mastery – to remember the syntax and write code from memory – you need to stop forgetting.

Ask yourself, how much time have you spent trying to learn and master data science? Do you remember what you’ve learned?

My guess is that out of all of the syntax you’ve “learned” by reading a blog post or cutting-and-pasting code, you remember only a small fraction.

And I bet it’s frustrating. You’re wasting time! If you weren’t forgetting so much of what you learn, you’d probably have mastered R and Python by now!

This is the problem.

And to be clear, I’m not trying to berate you. We all forget. This is just how the human mind works. But like it or not, it’s a problem.

To master data science, you need to stop forgetting what you learn.

Repetition is the key to memorizing syntax

I’ll simply tell you the secret to stop forgetting.

It’s the secret to remembering what you’ve learned. It’s the secret to mastering data science syntax, or any skill for that matter.

The secret to remembering what you learn is repetition.

Repetition is the key. You need to repeat your practice.

Top performers repeat their practice activities

If you look at top performers of all kinds, they repeat their practice activities. Top performers know that the secret to mastery is repetition. Relentless repetition. Repetition until the skill becomes second nature. Repetition until you’ve achieved “unconscious competence.” Then a little more repetition, just to be sure.

Athletes repeat their practice

In his book Relentless, elite basketball trainer Tim Grover – the famous trainer of Kobe Bryant and Michael Jordan – explained that guys like Michael and Kobe practiced basketball shots over and over again. He notes that on some days, an elite player might practice a single shot thousands of times in a single day. Thousands of repetitions. Michael Jordan wasn’t born the best, he earned it with millions of repetitions over years of training.

Musicians repeat their practice

We see something similar among musicians. Elite violinists and musicians are known to practice a single “phrase” of a song over and over. At first, they might just practice to learn and memorize the phrase. At first, they’re just trying to execute the musical phrase correctly. But as they move on, they try to refine it and add a bit of finesse. For musicians, the process of memorization and refinement comes from careful repetition.

From personal experience, I can also tell you that learning a musical instrument is strongly dependent on repetition. You have to repeat scales and techniques until you can execute them fluidly. You repeat them until they are second nature. Eventually, you repeat them until you never forget them (I can still play songs that I learned years ago).

Navy SEALs repeat their practice

Navy SEALs – who are renowned for their ability to succeed in tough battle conditions – are also relentless trainers. They succeed because they prepare. SEALs make a point to train their skills until they have them completely memorized. Then they train some more.

Here’s a quote by retired Navy SEAL, Rob Roy:

“Repetition leads to memorization and memorization leads to instinct. Therefore, one must train and train their skills until they know a procedure cold. And then they must train some more.”

I love this quote. It’s nominally about training combat techniques, but it applies to almost any skill. The secret to becoming great at anything is repetition. It doesn’t matter if you want to be a soldier or a data scientist.

If you want to walk into a data science interview and astound them with how well you know the code, you need to practice syntax. You need to repeat the syntax you learn. If you want to be fluent in writing data science code, and if you eventually want to be one of the best, you need repetition.

To master data science, you need to repeat your practice

To be a great data scientist, you need to have some core syntax memorized. You need to essentially memorize the code to perform basic techniques: data manipulation, data visualization, data analysis.

Will you occasionally forget a few things. Yes. Sometimes. But there are some things that you should know cold. For example, you should be able to write the code to filter a dataset without hesitation. You should be able to write the code to create a scatterplot without flinching. You should have these memorized.

Being a great data scientist means that you need to have some things memorized. That’s what it takes to get things done.

To memorize the syntax, you need to repeat it. Over and over. And then train a bit more …

Repetition requires a practice system

Repeating your practice activities requires you to be systematic. You can’t be haphazard in how you train. If you don’t train in a systematic way, you’re greatly increasing your chances of failure.

This shouldn’t surprise you. Many skills have practice drills that are used to help people learn and master that skill.

For example, musicians often have drills to practice scales, musical phrases, chords, and arpeggios.

Basketball players have passing drills or drills to practice a particular shot over and over.

The best performers are systematic in how they train.

If you want to be great data scientist, you need to be systematic as well.

The problem here is that almost no one trains this way. As I’ve noted before, many new data science students use the “jump in and build something” method of data science training. They just jump in and start a big project. This would be like trying to learn basketball by just playing a few games every now and again. Yes, you can learn a little, but you’ll never be as good as someone who drills and trains the details to mastery.

To be clear, data science projects can be good later in your data science training. Projects are a great way to integrate what you’ve learned, after you’ve mastered the basics. But projects are completely inefficient for mastering basic syntax. The reason is that when you work on a project, there’s not enough repetition. In a project, there are many pieces of that you will only use once or twice. That’s not enough repetition to memorize the code snippet.

No. Projects aren’t the best way to master data science because they don’t give you enough repetition. You need a structured practice system.

Sign up for our email list now

In our paid training courses at Sharp Sight, we will show you just such a practice system. Our courses teach you a specialized training methodology for repeating data science syntax until you have it memorized.

Many of our students report becoming “fluent” in writing data science code. They write code from memory.

If you want to learn how to practice data science … how to repeat your data science practice so you memorize the syntax, sign up for our email newsletter and enroll in our course when it opens.

Our training courses only open up a few times per year, and the enrollment details are sent exclusively to people on our email list.

Joshua Ebner

Joshua Ebner is the founder, CEO, and Chief Data Scientist of Sharp Sight.   Prior to founding the company, Josh worked as a Data Scientist at Apple.   He has a degree in Physics from Cornell University.   For more daily data science advice, follow Josh on LinkedIn.

4 thoughts on “A key for mastering data science”

    • I will open the R version again soon. Probably sometime in July.

      There’s also a Python version in the works, although it probably won’t be opened for quite some time.

      Reply
  1. Thanks for writing this piece. Cannot agree more with you. I have been looking for a well structured data science course which helps with repetitive practice methodology. Unfortunately haven’t found one. When will you be next opening the course and do you have a Python version also ?

    Reply
    • We have one course for mastering R that we’ve been running for a while now. That opens up about once every 3 or 4 months.

      We also have a Python course that we’re releasing very soon. The beta version should open to a few students in July of 2019, and then open for general enrollment by August or September.

      Keep in mind though … we only announce enrollment to people who are on our email list!

      Reply

Leave a Comment