The hard way is the best way

Here at Sharp Sight, we teach data science. And in particular, we show people how to master data science extremely fast.

Exceptional results, as fast as possible.

Our training methodology borrows heavily from elite performers and “ultra learners,” wherever we can find them.

But our promise of “exceptional results, as fast as possible” sometimes confuses people.

Some people make the mistake of thinking that this is the easy path.

I have news for you.

If you want the easy path, you’re in the wrong place.

“The Hard Way is the Easy Way”

One of the people from whom I draw inspiration is a blogger and teacher named Scott Young.

At his blog, Scott writes about rapid skill acquisition and high performance. He also wrote a book called Ultralearning, which is – as the name implies – about learning ultra fast.

Since Scott is an expert on rapid skill acquisition, I read his work quite a bit (and you should too) because he often has excellent tips on how to learn at an accelerated pace. Additionally, his work sometimes deals with technical subjects, so it’s often highly relevant for data science students.

However, one of his recent posts particularly caught my eye, because it addresses the apparent confusion that many students have when they’re trying to achieve high levels of proficiency in a short amount of time.

His advice on this is simple: “the hard way is the easy way.”

It’s a simple, elegant statement about the paradox how to achieve elite performance.

And since I liked what he had to say, I wanted to expand on his idea and apply it specifically to data science.

In data science, the hard way produces the best results

In data science, like almost anything worth learning, the hard way is easy and the easy way is hard.

What I mean by that, is that the path that looks easiest to most people ends up being the hardest, and the least likely to produce excellent results.

But on the other hand, the path that looks hard actually ends up being easier and it produces better results.

If you’re looking for the actual “easy” way, then you need to be willing to put the time in and do hard things.

With this in mind, there are at least 3 areas that you need to focus on, and all of them will be challenging and uncomfortable:

drilling
application
feedback

Let’s discuss each of these in turn.

To rapidly master syntax, you need to drill syntax

To rapidly master data science, you really need to memorize the critical syntax.

Most people don’t want to accept this, but it’s true.

Learning a Programming Language is Like Learning a Spoken Language

Here’s an analogy.

Learning a programming language is almost exactly the same as learning a spoken language.

If you want to learn Spanish, then one way or another, you need to know the right words. How can you speak if you don’t know the right words? Ultimately, you need to learn and remember the words.

You can either do this haphazardly and passively by trying jump in and use the language, or you can be ultra-strategic about it and deliberately master key vocabulary.

Keep in mind that many polygolots and “language hackers” find the most common 500 to 1000 vocabulary words in a language and deliberately memorize those vocab words.

Tim Ferriss has talked about this and wrote about it language learning strategy on his blog.

Language hackers deliberately practice and drill until they have memorized key vocabulary. That’s one of their secrets.

To master data science, you need to know the syntax

In data science, it’s the same.

If you want to master data science, one way or another, you need to know the syntax. You need to learn the syntax. And you need to learn the syntax well enough that you can remember it when it’s time to use it.

That is: you need to memorize the critical syntax.

Just like with a spoken language, you can be passive about this, or active and deliberate.

The passive and haphazard way is to just “Jump in and Build” something (the JIAB strategy).

The Jump in and Build something strategy is exactly as it sounds: the student decides to learn data science by jumping in and building something … some project or analysis script.

This sounds “easy” in the sense that it’s more interesting and more fun.

But it’s actually one of the least effective methods in the very beginning.

Why?

Because when you work on a project, you don’t get enough repetitions of the most common, most essential syntax.

This is actually an important learning principle: what gets repeated gets remembered.

But at the same time, what does not get repeated, typically gets forgotten.

If you “jump in and build” and you don’t get enough repetitions of critical syntax, you’ll probably forget most of what you learn.

Moreover, in most projects, there are almost always intermediate-to-advanced techniques that you’ll need to use, even if only once or twice.

Most students get stuck when they need to solve one of these more advanced problems. Most people get discouraged at this point and just quit.

For most people, JIAB sounds like “the easy way.”

It’s actually the way that produces some of the worst results.

Drilling Syntax is Hard

So how do you master data science syntax ultra-fast?

You practice.

Drill syntax.

Like a language learner drilling vocabulary, or a guitar player drilling scales, or a basketball player practicing a shot over and over again … you drill.

Drilling works.

Recall what I said: what gets repeated gets remembered.

If you can find a way to practice, drill, and repeat critical data syntax, then you’ll remember and master it much, much faster.

This is the easy way in the end, but the easy way looks hard.

Are you willing to sit down every day to practice?

Do you have the discipline to sit down every day for 20 or 30 minutes, or more?

Consistency and repetition work like magic, but most people aren’t willing to be that disciplined.

The easy way is actually hard to execute.

To rapidly master data science, you need to apply what you learn

Once you learn essential syntax then you can apply what you learn.

As I mentioned earlier, the Jump In And Build strategy is ineffective and inefficient, but that’s only true if that’s the first thing you do.

“Building something” is highly effective after you’ve mastered the critical syntax.

In particular, working on projects is very effective for integrating simpler things that you’ve already learned.

For example, if you want to learn data manipulation in Python, you’ll want to learn things like how to subset rows, how to drop missing values, how to fill in missing values, etc.

So in the beginning stages, you’ll want to drill the syntax for those techniques individually.

But once you learn the syntax for the critical functions (there are about 15 to 20 Pandas functions that you need to master), then you need to learn how to put those tools together to do actual work.

One of the best ways to do that is to work on a project.

Writing your own code is hard

Once again though the best way looks hard.

When most people start working on a project, they get stuck somewhere.

There are at least two solutions to this: power through and find the solution, or choose a simpler project.

Now, there is a bit of an art to selecting a good project. You need to select a project that’s just on the edge of your skill level. If you’re getting a little bit stuck and getting pushed a little bit out of your comfort zone, that’s good.

But the other solution to getting stuck is to stay inside your comfort zone. Here, a person just consistently chooses projects that are easy. You won’t improve this way.

Getting out of your comfort zone – selecting a project that’s just at the edge of your ability – is hard.

But consistently pushing yourself into challenging territory is how you improve.

The hard way is the easy way.

To master data science, you need feedback

Increasingly, I think that in order for data science students to master data science (and especially, to master it fast), they need feedback.

Data science is not strictly a science.

There’s a lot of art involved:

… writing clean code

… designing insightful data visualizations

… solving problems in a clear and effective way

There’s a lot of nuance, and a lot of things that are hard to learn on your own. It’s possible, but it takes 10X more time to figure everything out on your own (and, there’s a risk that you might not ever learn these nuances by yourself).

The best way to learn these little things is feedback. Having a coach or mentor can help you learn a lot of the nuances and little details of data science that will take you from a junior level to a senior level.

Feedback is hard

Again though, feedback is hard.

Having someone point out the bad parts of your code is uncomfortable.

Listening as someone suggests that you should completely redo a data visualization is going to be painful.

You might not like having an expert show you how you could have saved yourself a few hours by doing things differently.

Getting critical feedback often feels bad. It stings the ego.

Taking feedback is hard.

But at some level, it’s the best way to progress. A skilled coach or mentor can accelerate your progress by zeroing in on your biggest mistakes, which will become your biggest opportunities for growth.

Taking feedback is hard, but in the end, getting feedback is the easy way.

Easy choices, hard life … hard choices, easy life

The hard path looks easy.

It looks easy to skip mastering syntax. It looks easy to choose “easy” projects. It looks easy (i.e., less painful) to avoid feedback from experts.

But those easy choices will lead you to being a mediocre data science wannabe who keeps struggling. Trust me. I’ve seen it.

On the other hand, if you choose the hard things – memorizing syntax, pushing your comfort zone, getting useful feedback – these things will help you master data science much, much faster.

And you’ll get the benefits of being a top tier data scientist much faster too.

Which will you choose?

The hard path or the easy path?

1 thought on “The hard way is the best way”

MG Kibria

July 2, 2020 at 2:42 PM

Hi Josh,

I have a question — given the number of different packages we use for data manipulation, visualization including geo-spatial and modeling work in R, how is it possible for someone (probably a data analyst or a program manager) to memorize such extensive list of functions, syntax, etc? Or, you are just highlighting certain common packages(e.g. maybe tidyverse, tmap, tidymodel).

Many thanks for educating us.

Thanks,
kibria