A common question I get from beginning data science students is “what skills do I need?” Beginners want to know what skills they need in the long run, and also what data science skill is important to focus on first.
In response, many people give these beginners a long list of data science skills that they need to learn:
- Machine learning
- Deep learning
- Data analysis
- Data visualization
… the list could go on at some length.
Some of these skills are more important than others. Some of these skills are a waste of time for beginners. You’ll need to be very strategic about which skills you learn and which you avoid. Some skills are things that are very, very important for a beginner. Other skills are best reserved for advanced data scientists.
But there’s one skill that’s more important than almost any other data science skill.
… And it’s something that you’ll almost never hear about.
On final analysis, data scientists are not ultimately not hired to create models, analyze data, create presentations, or any of those things. In fact, in some sense, models, analyses, software, and presentations are obstacles that stand in the way of what businesses really need.
Ultimately, you’re not hired to create models, analyses, and presentations.
You’re hired to create business value.
The most important data science skill is creating business value
Data scientists are hired to create business value, not analyze things.Data scientists are hired to create business value, not analyze things. Click To Tweet
To give credit where credit is due, this is a rephrasing of a quote by the “internet famous” programmer and entrepreneur, Patrick McKenzie. A few years ago, Patrick said that, programmers are “hired to create business value, not to program things.”
It’s basically the same with data science.
When you get hired as a data scientist, you are hired for the value you can potentially generate for the organization.
HR representatives and hiring managers say that they want machine learning, visualization, and analyses, but what they really need is business value.
This is an extremely important distinction to make, because it’s possible to spend hundreds of hours on a data science project that contributes zero value.
It’s also possible to spend 15 minutes on a quick-and-dirty SQL data pull that can save a business thousands, even millions of dollars.
Now, I don’t want you to get confused: as a data scientist, you are hired to create business value with the tools of data science … analyses, models, data management, etc. But your employability and your salary will entirely depend on how you use those tools. If you use those tools in a value-heavy way, companies will fight over you and you’ll be able to command a high salary. If you can’t create value with those tools, you’ll be stuck forever in a low-salary position. Even worse, if you can’t create business value, you’ll be stuck in an unpaid internship. If you can’t create value with the tools of data science, you’ll sadly be one of the bottom 80% who struggle to succeed in the data industry.
It’s not strictly about deliverables. It’s not strictly about what programming languages you know.
You’ll be paid in proportion to the value you create.
An embarrassing story: an analysis that contributed negative value
In my earlier days as a young data scientist, I didn’t understand this point. Let me tell you a story about a personal data fail, where I did not actually generate value with my deliverables.
Several years ago, I worked as a data scientist and strategist at a large ad agency. (Note: this is why I refuse to watch Mad Men … I lived Mad Men for a while.)
Some of my business partners were working on a large project for a major corporate client. They were doing an overhaul of the client’s marketing, starting with demographic analysis, segmentation, and a few other deep dive analyses.
They had worked for a few weeks and had made some progress, but in spite of their initial progress, the project was incomplete. Worse yet, the deadline for a client presentation was coming up; they still needed more insight in order to deliver targeted recommendations.
About 3 days before the presentation to the client, they asked me for a fairly detailed analysis of the client’s customer list and past marketing efforts.
Now, in my defense, it was a terribly tight time window. Unreasonable, in fact. But, it had to be done.
When I talked to them in order to flesh out the project requirements, my business partners said that they wanted an analysis with over 100 plots, charts, and tables. I’ll spare you the full details, but they basically just wanted a lot of charts.
“Ok,” I told them.
That was that. You want over 100 charts and graphs? Fine. Done.
I gave them exactly what they asked for.
They spent the following 48 hours trying to make sense of it all, including sleeping on the floor in the office overnight. It left them even more confused than when they started.
They were pissed. And the client was pissed. And my manager was pissed. In fact, it was hard to find a person in the office that week who wasn’t pissed off at me.
The problem is that my output produced zero value for my business partners. In fact, you could make the argument that it produced negative value, because they lost time and I lost time. Moreover, they ended up more confused than at the beginning of the project.
They asked for charts, but what they needed was help. They asked for graphs, but what they needed was insight. They asked for an analysis, but what they needed was value, both for themselves and ultimately for the client.
Instead of giving them something of value, I gave them a stack of charts and graphs that just caused them to question their life decisions.
It was a mistake.
4 ways to create business value as a data scientist
I don’t want you to get confused.
This is a data science blog, and I initially told you that “you’re actually not hired to create models, analyze data, create presentations, or any of those things.”
Don’t take that out of context.
The fact is, as a data scientist, you do need to create models, analyze data, create presentations, but you do so in service of creating business value.
That’s the real purpose of your deliverables. Creating business value. Being able to create business value is the most important data science skill, and you can’t take your eye off of that. Ever.
Having said that, let’s go into a little more detail about how you can create business value as a data scientist. These are particularly useful for junior data scientists. (Advanced data scientists may use other more advanced techniques.)
“Finding insights” in data is one of the core methods of creating business value in a company.
As a budding data scientist, you’ll hear this term repeatedly.
For example, you’ll read about it in job ads. Hiring managers and HR reps demand that you be able to “find insights in data” or “discover data-driven insights.”
Once you get hired, your business partners and management team will also routinely ask you for “insights” into particular business problems.
The phrase “finding insights” confuses many young data scientists because it’s not entirely clear what this means. At it’s core, finding insights is mostly about analyzing data to find recommendations that can add value to the business. It’s mostly about data analysis and data storytelling.
If you’re new to data science or a junior member of a data science team, this is where you should be focussing your efforts. The reality is, as a junior data scientist, you probably won’t be put on a large or complicated project. You probably won’t be working on advanced machine learning methods (although there are exceptions). More often than not, the best way to add value as a junior data scientist is to analyze data and find recommendations that can improve metrics that are important to the business.
This is both harder than it sounds, but also relatively easy to do, if you know how to use the tools of data analysis.
Long time readers of the Sharp Sight blog will know that “finding insights” via data analysis largely consists of applying basic data visualization tools. Typically, you’ll need to create charts, graphs, and visualizations to uncover patterns and opportunities.
Importantly though, you can’t just make dozens of charts and graphs just for the hell of it. Typically, you need to apply the toolset strategically to look for opportunities to improve key performance indicators (i.e., KPIs).
So for example, if you’re in a marketing department, you’ll commonly want to optimize things like return on investment, or the number of new customers. If you’re working in customer service analytics, you’ll be looking for ways to improve “customer satisfaction.” If you’re in finance, you might be looking for opportunities to optimize one of several profitability metrics.
Although this is often harder than it sounds at first blush, finding insights in data is not haphazard. There are high-level processes that you can use to systematically uncover these sorts of insights.
Before you learn the process for finding insights in data though, you need to master the data analysis toolkit. Long time readers of this blog will know exactly what I’m talking about:
- The bar chart
- The line chart
- The histogram
- The scatterplot
- The small multiple
Do. Not. Neglect. These.
As I commonly point out, many people want to skip these basic skills and move on to the advanced material right away. Don’t.
This is low hanging-fruit, people. Many data scientists fail to master the basic toolkit, and they are terrible at finding insights in data.
Meanwhile, businesses are pretty desperate for data-driven insights.
If you can give them what they want – if you can find insights and create value for the business – they’ll pay.
Optimize systems and processes
As I already noted, the skill of optimizing systems and processes goes hand in hand with “finding insights.”
Typically, businesses are trying to optimize for a set of key metrics (KPIs). Moreover, at a high level, almost every business is trying to optimize for profits, shareholder value, revenue, or market share.
Drilling down into specific departments, marketers are trying to optimize for ROI or conversion rates. Sales teams are trying to optimize for lead generation and conversion rates. Finance teams have a suite of financial metrics that they may be trying to optimize (depending on the goals and financial condition of the business). Operations teams may be optimizing for productivity metrics or a financial metric (like profit margin). Engineering teams creating a physical product may be trying to optimize the performance of some element of that product, like fuel efficiency in a car.
As businesses collect larger quantities of data in all areas of a business, they want to use that data to improve or optimize almost everything. Remember, in many businesses today, employees and managers get paid in relation to whether they hit their goals for a particular KPI. If you help them improve their performance on that KPI and get paid, there’s a good chance that you will get paid too.
So, you will often need to think in terms of optimization. This is strongly connected to “finding insights.” You’ll need to find insights in data, but you’ll often need to do so with a particular metric or metrics in mind. Keep your eye on the ball.
Predictions are a tricky thing. They are hard to do right and easy to get wrong. You need to be careful with predictions.
The problem with predictions is if you get them wrong, everyone is pissed off at you. But if you get it right, people often just say “thanks for doing your f*^king job.” Essentially, the prediction game often has low upside, but potentially high downside.
Having said that, there are some domains where it is relatively easy to make decent predictions, and the downside of getting it wrong isn’t too terrible.
As a junior data scientist, you’ll need a few tools for making predictions. The best are simple, tried-and-true methods: linear regression and logistic regression.
I know that some of you will protest. Linear regression and logistic regression aren’t sexy. I get it. Saying to your friends that you “do linear regression” isn’t half as cool as saying, “yeah brah, I’m building a deep network at work to categorize cat pictures.”
But even though they aren’t as sexy as some modern tools, linear regression and logistic regression are true workhorses. They’re “good enough” for a variety of problems, and they are easy to understand and interpret. The ability to interpret them can be surprisingly important; black-box methods can be a huge risk in some industries, so it can be useful to use easy-to-interpret methods.
Moreover, linear regression and logistic regression are often just easier to use as a beginner. If you want to get a junior data science job, I highly recommend that you learn linear regression and logistic regression.
I’d also recommend decision trees. Like linear regression and logistic regression, decision trees are relatively easy to interpret. Trees are also very forgiving. In some cases, they are resistant to problematic data like correlated data, data with missing values, skewed data, etc.
I could write at some length about the best tools for making predictions, but as a junior data science hopeful, stick to a few tried-and-true techniques.
If you can learn these and learn how to apply them, you will have a good toolset for making valuable predictions.
Another way you can create value is by being a great communicator.
Similar to “finding insights in data,” being a good communicator is really low hanging fruit, because many data scientists don’t do it well.
I hate to reinforce the stereotype of the nerdy, poorly-spoken science nerd that can’t communicate well, but it’s sort of true. Anecdotally, I’d estimate that one third to one half of my past co-workers were terrible communicators. Some of these people were brilliant coders and analysts, but if you asked them to package their recommendations into a PowerPoint deck or you put them in front of a management team, they were frankly, terrible.
Let the ignoble, mild social awkwardness of your competitors be your opportunity, young data science hopeful. This is essentially a gap in the market. There is a limited number of data scientists who are both good at data skills and good at communicating. Being good at both is a way to differentiate yourself.
Moreover, being able to present your findings to partners and management teams is valuable in-and-of itself. Decision makers will have to spend less time trying to figure out what the &^#* you’re talking about. If you communicate clearly, you will be able to deliver your data-driven recommendations and insights quickly and concisely. In turn, your partners will be able to take action more quickly to make valuable improvements to the business.
So speak well. Present well. Communicate well.
If you can, you’ll get paid well.
Creating value isn’t enough … you need to communicate your value
However, it’s not entirely as simple as “create value, get paid.”
… It is a little more complicated. Creating large amounts of value isn’t enough.
In addition to creating value, you also need to communicate your value.
This is actually very close to the idea of “signaling” in economic theory and evolutionary biology.
For example, take a peacock with his plume of peacock feathers. A large and colorful display of feathers is a signal to female peahens that the male peacock is fit, healthy, and successful. Peacock feathers are a signal of mating fitness. They are a signal of value.
In the case of a peacock, it’s not enough to just have great genes and general fitness. The peacock actually needs to display his fitness in order to attract a mate. The peacock needs to be evolutionarily valuable, but he also needs to communicate that value. He needs to signal his value (or at least, his physiology and genes do it on his behalf).
Learn from the peacock. It’s not enough to simply be a valuable data scientist. You need to signal your value to potential employers.
You need to cut through the noise and clearly signal your value
Communicating and signaling your value is even more important in today’s hiring and business environment.
Like it or not, you’re competing against tens of thousands of wannabe data scientists.
The truth is that 80% of these data science students are terrible at creating value. They are not actually strong competition. Many people in this bottom 80% can’t actually write data science code. Many of these people are what I call “cut and paste”coders; they just do endless google searches to find little code snippets. They cobble together programs by cutting and pasting small code pieces, and tinker with it until it runs without errors. These people are not actually “fluent” in writing data science code, like they need to be. They can’t actually write code, and they can’t create business value.
Now I want to be clear, every data scientist does google searches every now and again. Even good data scientists forget things sometimes. And to get better at data science, you’ll need to push your limits. When you do, you’ll eventually need to seek out resources to learn more advanced material. To push yourself, you’ll need to do searches. That’s normal.
But there is a large set of wannabe data scientists that can’t write code to accomplish even basic tasks. They can’t import a dataset. They can’t clean and “wrangle” a dataset. They can’t analyze data fluently and in a systematic way. They can’t actually do the work. They just search for code snippets and cut and paste.
These people will struggle to actually create business value. To the extent that they can create business value they only do it in small amounts and at a very slow pace. I hate to say it, but if you want a real data science job, that’s not good enough.
The issue here is that even though 80% of wannabe data scientists can’t create business value, they can fake it. They can make it look like they can create value.
For example, it’s very easy today to go online and effectively buy a certificate that says you know
dplyr. There are many online course vendors that simply issue a certificate at the end of a purchased course that says “John Smith has completed a course in
It gets worse. There are major universities that have “data science masters degrees” that don’t fully train people in hard data science skills. I know a student that completed a $30,000 data science masters course from an elite university, but at the end he still couldn’t write data science code. He had a diploma that suggested data science skill, but (by his own admission) he didn’t actually have skill. The diploma signaled value where none actually existed.
This will be a critical problem for you.
It’s not enough to be able to create business value for your company or client. You need to be able to effectively signal that you can create value. And you need create this “honest signal” of value in a hiring environment with a lot of noise. You’ll need to cut through the noise of the “me too,” cut-and-paste, wannabe data scientists that claim they can actually do the work.
It’s a hard problem. You need to be valuable, but also signal value honestly and clearly.
Signal value with data visualization
Long time readers of this blog will know that I strongly recommend that beginning data science students learn data visualization first.
There are several reasons for this, but one of the most important reasons is that you can use data visualization to create compelling signals of your data skill.
Let’s go back to the peacock analogy.
A great, beautiful data visualization is like peacock feathers. If you do it right, you can create a compelling display of your data skill and signal the value you might be able to create.
Now, I want to be clear that using data visualization like this is not a perfect “signal.” People can copy-and-paste code to create a visualization like this, so it is not necessarily an “honest” signal in all cases.
But if you’ve truly developed the skill to create something like this yourself, then you can use it to display and signal your value.
Signal value through displays of mastery
As I just noted: some people will just copy and paste code to create data visualizations. They will use a shortcut to create compelling visualizations and pad their portfolio. They will try to fake having real skill.
There are a lot of these “me too” data scientists right now, so you need to do more. You need to be able to write code fluently. You need to know what you’re doing, and you need to be able to show it off when the time is right.
This is one of the reasons that I’ve recommended that you become “fluent” in writing data science code.
Being able to write data science code fluently, rapidly, and from memory will enable you to display your skill.
This is particularly useful to do this in an interview. One interview tactic is to bring a laptop, open it up, and write some code on the fly. When a hiring manager sees you write code clearly and fluently, you will have a much better chance of getting the job.
Critically, you want to show mastery of basic techniques. You can tell an interviewer that you intend to help them identify “data driven insights.” But then you need to show them. Code up a scatterplot and a bar chart. Show your stuff.
Like playing an instrument, writing actual code on a keyboard is a physical skill, and it’s essentially impossible to fake.
By writing code in-person for potential employers and clients, you reinforce the perception that you can actually get things done. You show them that you can actually do the work. You provide an honest signal that you can create value.
Build a reputation
Over time, you’ll want to build a reputation.
This is a longer term project, and you’ll need to think about the reputation that you intend to build. It’s important though, because what other people say about you will typically carry more weight than what you say about yourself.
There are several things that you’ll want to develop in your reputation. You’ll want to develop a reputation for being a reliable person; a trusted advisor; a person who “gets things done” (particularly in a crunch). You want to develop a reputation for being someone who creates value for business partners, teams, and companies. The more you can quantify this the better, so you want to be able to express this in terms of metics and quantifiable improvements to a business.
Ultimately though, your skills and achievements will best expressed by other people.
So, learn to create value. Learn to signal value. But also develop relationships with people who will be advocates for you and your great work. (Hint: the best way to do this is to serve your clients well in the first place by creating massive value for them and the business.)
You just read several thousand words on creating value and signaling value as a data scientist.
Here are some action steps:
- Master basic charts and graphs. If you haven’t mastered the basics, like the bar chart, line chart, histogram, scatter, and small multiple go back and learn them. Practice them. You should be able to create these fast without even thinking about it. As I mentioned earlier, this is low hanging fruit. They are “basic” tools, but you can still use them to create massive value for businesses.
- Learn a little basic machine learning. Start with the basics. Don’t go overboard, especially if you’re not fluent yet in the basics. Learn linear regression and logistic regression. If you have time, learn the basics of decision trees.
- Learn to communicate your findings. In particular, you want to learn to communicate clearly in your emails. Learn the fundamentals of designing good presentations. Consider taking a class on public speaking.
- Learn to write code fluently, and from memory. You need to be able to signal your value. One hard-to-fake way of doing this is by being able to write code fluently, on-the-fly, and in person. Being able to do this is a great way to signal that you can create value … you can show people physically that you are capable of doing data science work.
Learn to create business value with data
To get hired as a data scientist, you need to be able to create business value.
You also need to be able to communicate your value.
It’s hard to learn to do this. Don’t try to do it alone.
Sign up for our email list, and learn the tools and techniques you’ll need to become a valuable resource on a data science team.
When you sign up, you’ll get weekly tutorials delivered to your inbox.
Moreover, if you sign up now, you’ll get access to our FREE Data Science Crash Course.
In the Data Science Crash Course, you’ll learn:
- a step-by-step data science learning plan
- the 1 programming language you need to learn
- 3 essential data visualizations
- how to do data manipulation in R
- how to get started with machine learning
- the difference between machine learning and statistics
SIGN UP NOW