You don’t need to know much math for data science

Recently, a Sharp Sight blog reader emailed me and asked for advice about data science prerequisites.

He was nervous about math.

Someone had told him that in order to study data science, he needed to learn a long list of math topics first:

  • Precalculus
  • Calculus
  • Multi variable calculus
  • Trigonometry
  • Linear algebra
  • Differential equations
  • Statistics

Although there are a few items on the list that I hope you’ve learned by the time you’re out of college, you might be surprised about which ones are truly necessary to get started with data science.

As a beginner, you don’t need that much math for data science

The truth is, practical data science doesn’t require very much math at all. It requires some (which we’ll get to in a moment) but a great deal of practical data science only requires skill in using the right tools. Data science does not necessarily require you to understand the mathematical details of those tools.

That being said, it’s important to understand that there’s a difference between the theory that underpins data science, and data science as it is practiced. This makes all the difference.

There’s a difference between theory and practice

When talking about how much math you need for data science, it’s important to distinguish between data science “theory” and data science “practice.” When I say theory, I’m referring to data science as it’s studied in an academic environment for research purposes.

This theoretical data science is often very different than the practical data science performed in business or industry. They are different because the priorities are different, and the deliverables are different. Academics mostly produce papers and novel research, whereas data scientists in business or industry will produce reports and analyses (commonly in the form of PowerPoint presentations); models; and software systems. The focuses are different and the deliverables are different, so the required tools will be different.

This is not to disparage academics. Far from it. People researching data science in universities are needed to push the field forward. But you need to understand that there are differences between theoretical data science and data science as it’s practiced in business or industry.

There’s a difference between junior data scientists and senior data scientists

There’s also a second distinction we need to make that’s relevant to how much math is needed for data science. There is a difference between junior data scientists and senior data scientists.

Junior data scientists often don’t need the same depth of knowledge as senior data scientists.

I hate to break this to you, but when you get hired in as a junior data scientist, you probably won’t be working on the coolest, sexiest projects first. On the contrary, you’ll probably have to do “grunt work” for your first 6-18 months.

At best, many new data scientists will be tasked with very simple projects. Such projects may be simple analyses and simple reports.

At worst, you’ll be pulling data and cleaning data for the more senior members of the team.

But either way, the fact is that as a junior data scientist, it’s pretty likely that you’ll be doing grunt work, and not working on advanced projects. It’s not that bad, but you need to set your expectations and also prepare accordingly.

Foundational data science doesn’t require much math

As a junior data scientist working in business or industry, you will primarily need to work with what I call the “foundational” skills of data science.

By “foundational” skills, I mean the fundamental skills you will need as a practitioner. Here at Sharp Sight, we sometimes call these the “core” data science skills, or “the foundations.”

What are they?

  • Data manipulation
  • Data visualization
  • Data analysis (AKA, exploratory data analysis)

Why are these the core skills? Because in a practical setting, almost everything else relies on these. These are the “core” practical skills for “getting things done.”

You’ve probably heard the rule of thumb that 80% of your work will be data manipulation or data cleaning. Although we might argue with the exact proportion, I can definitely say that 80% sounds close. A very, very large amount of your work will be spent collecting data from a variety of sources like text files, spreadsheets and databases; cleaning that data; and performing basic exploratory data analysis.

Almost all deliverables will require these skills, especially if you’re working in a junior role. Reporting requires the core data science skills. Data analysis requires core data science skills. Building machine learning models requires core data science skills. For almost all deliverables, you’ll need to use data manipulation, visualization, and/or data analysis.

But how much math you need to do these core skills?

Very little.

This fact runs against the common narrative that data science requires a lot of math knowledge. The truth is, most of these basic skills can be learned without learning math beforehand.

What math you need for “core” data science skills

So how much math do you need for the “foundational skills” of data science?

I think that a smart first-year college student has enough math knowledge to perform the “core” data science skills. You heard that right. You basically only need the sort of lower level algebra and simple statistics that you would have learned in grades 8 to 12.

Let me give you an example.

Example: getting data and cleaning data

How about creating new variables? Again, this requires almost no math skill. Let me give you an example.

Let’s say we have a dataframe with two simple variables, x and y. We want to create a new variable that equals x divided by y. We’ll call this new variable new_var.

# R CODE TO CALCULATE A NEW VARIABLE IN A DATA FRAME

library(tidyverse)

df <- tibble(x = 1:3
            ,y = c(10, 5, 10 )
            )

df %>% 
  mutate(new_var = x/y) –>
  df

This is a very simple operation, but it’s representative of the sort of data manipulation you will need to do.

Does this require calculus? Linear algebra? No.

When you’re creating new variables, you typically don’t need to do anything complicated. It’s commonly no more complicated than dividing one variable by another, or maybe performing a basic statistical manipulation like calculating a mean.

Are there exceptions? Yes. Are there cases where you need to do a complex computation to create a new variable? Yes. But they are much more rare, especially for beginners.

To put it simply, I would estimate that 95% of all data manipulation requires only simple math.

Example: a scatterplot doesn’t require advanced math

Here’s another example.

A very common data task is creating basic charts and graphs for exploratory data analysis. This essentially amounts to using simple data visualization techniques for the purpose of data analysis.

One of the most common is creating scatterplots:

# R CODE TO CREATE A SCATTERPLOT

mtcars %>% 
  ggplot(aes(x = disp, y = hp)) +
    geom_point()

A simple scatterplot which shows that you don't need advanced math for data science.

Ask yourself, do you really need calculus for this? How about linear algebra?

(No. The answer is no.)

To create this scatterplot, you don’t need college level math. You only need basic math. If you’ve taken 6th grade math and you know what the Cartesian coordinate system is, you’re half way there. This is not hard … not mathematically.

The hard part about this scatterplot is the syntax. To do data science – to really get things done – you need to master syntax, not math. Of course, you need to be able to apply the syntax and use visual tools the correct way, but that still doesn’t require calculus.

Example: a histogram doesn’t require advanced math

Another example. Let’s create a histogram:

# R CODE TO CREATE A HISTOGRAM

diamonds %>% 
  ggplot(aes(x = x)) +
    geom_histogram()

A simple histogram that also demonstrates that you don't need advanced math to get started with data science.

Once again, this does not require advanced math. Of course, you need to know what a histogram is, but a smart person can learn and understand histograms within about 30 minutes. They are not complicated.

For the most part, if you’re getting started, then core data science skills like data manipulation and data visualization won’t require advanced math. Algebra and basic problem solving skills are probably enough to get started.

You don’t need much math to start learning machine learning

Ok, so the “foundational” skills don’t require advanced math like calculus and linear algebra, but surely machine learning does, right?

I’m going to make a bold claim.

You need almost zero advanced mathematics to get started with machine learning.

Basic machine learning requires limited math

I will admit that machine learning requires more math than data manipulation and data visualization.

Having said that, machine learning does not necessarily require advanced math like calculus and linear algebra.

This will confuse many people. Once again, most people hear that they need to know advanced math before they can start studying machine learning.

What’s going on here?

There’s a difference between theory and practice

Let’s go back to one of the key distinctions I made at the beginning of this blog post:

There’s a difference between theory and practice. That applies generally to data science, but also specifically to machine learning.

On the theory side (meaning, in an academic environment) machine learning requires a lot of math. Seriously, just read a few machine learning papers and it’s really obvious: there’s a lot of math. Calculus, linear algebra, statistics, the occasional reference to information theory …. Machine learning papers use a lot of math.

But how about practitioners?

On the whole, practitioners use a lot less math when doing machine learning. For a typical practitioner, they will use a lot less (although there are exceptions).

The difference between theory and practice becomes even more stark when we re-consider the other distinction I made at the beginning of the article: the distinction between junior and senior data scientists. Although you can find senior data scientists who sometimes need advanced math to solve a machine learning problem, junior data scientists almost never need this.

Keep in mind, it’s somewhat common that junior data scientists won’t work on ML projects anyway. Remember when I said that junior data scientists often get stuck with the “grunt work.” If you’re the junior member of a data science team, you might not be working on an machine learning project anyway.

But if you do work on a machine learning project, how much advanced math do you need?

Not much at all.

You can learn many machine learning topics without advanced math

For almost every machine learning algorithm, you can learn how the algorithm works without any knowledge of calculus or linear algebra whatsoever.

As a case in point, I recommend that you find a copy of the well known machine learning textbook, An Introduction to Statistical Learning. Many people, myself included, consider this to be the best introduction to machine learning that’s available (although the authors use the term “statistical learning”).

ISL (as the book is often called), provides a broad overview of machine learning techniques. In this book, you’ll find explanations of almost every major tool and technique:

  • Linear regression
  • Logistic regression
  • Support vector machines
  • Decision trees
  • Neural networks
  • Regularization (lasso and ridge regression)
  • Feature extraction techniques, like principal component analysis

Again, this book provides a broad overview of the most important machine learning techniques.

Do you know how much calculus and linear algebra it uses?

Almost zero.

And it uses very few concepts from statistics or computer science.

Here’s a quote by Larry Wasserman, a professor of Machine Learning at Carnegie Mellon (one of the best universities for machine learning):

This book provides clear and intuitive guidance on how to implement cutting edge statistical and machine learning methods. ISL makes modern methods accessible to a wide audience without requiring a background in Statistics or Computer Science.

Larry Wasserman, Professor, Department of Statistics and Department of Machine Learning, Carnegie Mellon University.

Read that again.

Introduction to Statistical Learning – which is regarded as one of the best introductory books about machine learning – does not require a background in statistics or computer science.

And I can tell you from my own experience that calculus is not required either. The only reference to calculus that I’ve found was in the section concerning smoothing splines. (There’s an integral used in the smoothing spline equation.) This book is several hundred pages long, and there’s only one minor reference to calculus. That’s it.

I’ll say it again: you don’t need advanced math to get started with machine learning. You don’t need advanced math to get started with data science. You don’t need calculus or linear algebra. You can learn the essentials of machine learning with rather limited math background.

You can be an excellent machine learning practitioner without much math knowledge

What this means is that you can learn a lot of practical machine learning without advanced math.

In fact, I’m going to make a bold claim: you can become a very strong machine learning practitioner without knowing much advanced math.

Personally, I know quite a few excellent machine learning practitioners who do not have advanced math training. One works at Apple. Another works at Bank of America. Granted, they are both pretty smart, but neither is a math genius. They just know how to apply the techniques (and they get paid well into 6 figures to do so).

This should be very encouraging to you. There are many very successful machine learning practitioners who know very little advanced math. If you work hard, you can be one of them.

The math skills you actually need to start learning data science

If you want to learn data science, stop worrying about math. You need a lot less math than you probably expect.

If you’re just getting started with data science, here’s what you need to know:

Basic charts and graphs

You need to understand basic Cartesian plotting. If you’ve drawn plots on graph paper in an elementary math class, you probably know enough to get started.

Functions

You need to know what a function is and how to plot them. In this case, when I say “how to plot” I just mean that you need to know generally how to plot a function … what the process is. However, you don’t necessarily need to be able to plot a function in R or Python. If you’ve every plotted a function by hand on graph paper, you probably know enough.

Basic algebra

You need to know what a variable is. You need to know what an exponent is. You’ll need to be comfortable at a basic level reading math equations.

Basic statistics

To be honest, you don’t need to know that much statistics. You definitely need to be familiar with basic statistical calculations like mean, median, standard deviation, and variance.

Basic math notation

You also need to be familiar with standard math notation. You need to know about variables, exponents, subscripts, and other minor items like parenthesis. You should also be familiar with summation notation. Summation notation (AKA, sigma notation) appears quite a bit in machine learning.

Still have questions? Leave a comment below.

This is mostly all you need to get started learning data science.

If you’re smart and motivated, you can learn almost everything else if you know those simple mathematical foundations.

Do you have more concerns? Leave a comment below …

Joshua Ebner

Joshua Ebner is the founder, CEO, and Chief Data Scientist of Sharp Sight.   Prior to founding the company, Josh worked as a Data Scientist at Apple.   He has a degree in Physics from Cornell University.   For more daily data science advice, follow Josh on LinkedIn.

25 thoughts on “You don’t need to know much math for data science”

  1. Very good post!

    How much biochemistry does a medical doctor need to recommend medicines?

    How much trigonometry does a baseball player need to know to hit home runs?

    Just some questions for you to spice a little bit you post.

    Reply
    • I think you are right. Helped me make up my mind regards pursuing a standalone in Math. I just got hired as an AI Consultant. Background is 3 degrees across Econ and CS.
      On paper, I look quantitative. In reality, I can read applied math and follow it but I am very rusty. Also, never did courses in discrete/real analysis/pure math which kind of makes me feel like an imposter.

      Anyhow, going to focus on Python and Azure ML services and brush up on the applied as you say in the article.

      Heads up. I got hired cause I wrote CS postgraduate research in AI, not afraid of programming and I know how to communicate.

      Reply
  2. You are really doing a great job. There are many people who want to get into data science/ML but they are just scared of math. And few people are also afraid of coding/programming.
    pls keep it up your great job.

    Reply
    • Thanks so much for simplifying the importance of practice and application. A good master practitioner can go on to mastering the principles and a Africa if he so wishes.

      Reply
  3. I don’t agree to this. While you are talking about Junior data scientists, isn’t it a quick entry?What will happen when they become senior data scientists? Currently as you said, based on this idea people get into it but then they struggle a lot to understand the difference between models and what’s p value, significance, bias, variance etc.

    Confidence is very much related to depth of knowledge in my view point and data scientists with strong fundamentals are easily distinguishable from the rest.

    Reply
    • You’re making some assumptions about things that are mostly not accurate.

      First of all, it takes several years for a junior data scientist to reach senior levels. 5 to 8 years in a reasonably sized business with a clear hierarchy (although possibly faster in a smaller business with fewer levels, etc). 5 to 8 years is a lot of time to “backfill” any math knowledge that you might have missed.

      Second, even at senior levels, many data scientists still don’t use much math. For the most part, you don’t need to know calculus or linear algebra to build a machine learning model. That’s still true even for most intermediate and senior data scientists.

      Having said that, there is a small minority of senior data scientists that are creating custom-built models from scratch, or are modifying pre-existing tools. For those tasks, a person probably will need to know calculus and linear algebra. But those people are a minority of senior data scientists working in special companies or on special projects. I’ll repeat: if you’re not one of those people working on special models or special projects, you can almost certainly use pre-built, off-the-shelf tools … which don’t require much math knowledge.

      Concerning model selection and tuning: You really don’t need to know that much math to understand the differences between model types. Many junior people can learn these differences as heuristics and rules of thumb. The same thing can be said about bias/variance. You don’t need calculus or linear algebra to learn bias/variance concepts.

      As I noted in the article, knowing summation notation can enhance conceptual understanding, but summation notation is relatively easy to learn a relatively small exception to the overall thesis of the article.

      I will repeat: you don’t need calculus, linear algebra, and advanced math to get started learning data science.

      It’s much more profitable to learn practical skills like data cleaning, data visualization, data analysis, and basic machine learning.

      Reply
  4. Your site is just invaluable and I think you must optimize your site so that thousands of data scientists will stick to your site for life. Also, one suggestion is that please try to redesign the site so that python and R articles are segregated in a better way and are more intuitive.

    Reply
  5. I want to apply for Masters in data science. However most of the times I read about data science it says; require a knowledge of math or statistics.

    As a result of this, it makes me to think twice before applying as I am not good at maths.

    Any suggestions or recommendation would be appreciated

    Thank you

    Reply
    • I have to be honest … I think most Master’s degrees are bad investments. Most cost 30K to 50K and you could learn it all on your own, or with much cheaper online courses.

      The only reason that I would ever get a Masters like this is if it is from a truly elite university like Harvard or MIT (I don’t even know if they have DS master’s degrees). 50K is only worth it if you get an elite brand on your resume.

      … Otherwise, spend a few thousand on good books and online courses and learn it for 1/10th the cost.

      The only real issue is that you really need to be disciplined to learn on your own.

      Reply
    • Calculus and advanced math should really be a secondary consideration when you learn data science.

      I explain it all in the article: conservatively, you’ll spend 10x more time on data cleaning and data visualization than you ever will on calculus or stats.

      In my opinion, students should focus far, far more on data cleaning, data manipulation, data visualization, and data analysis.

      Pick a language (like R or Python) and learn how to do those things fluently.

      Reply
  6. Thank you a ton!! Reading your blog has made my mind clear, me being a working professional pursuing a PG in AI at a reputed firm, was banging my head for remembering all that Math. I often wondered are they really made into practice with real life scenarios? You have answered all my doubts… Congratulations dear, you just got another one to follow your blogs!

    Reply
  7. Hello,

    Your article is very good and comprehensive. However, I have the opposite question.
    Let me describe my background a little but for a bigger picture. I am a Computer Science bachelor graduate who is currently working at big tech company. In highschool and in university I was always more attracted to algorithmics and more theoretical subject (I was also doing competitive programming). Now, at work, I find my work dull because it doesn’t require enough problem solving and algorithms. I feel like I am not using my brain that much, but mostly doing monotonous work.
    So now I was thinking to transition to machine learning which is a more theoretical subject. However, I fear (and your article amplified my fear) that once working in production as a data scientist, the work will also be monotonous work. Something which will not require a depth understanding of math concepts and problem solving skills, but as you said, more like cleaning data and so on. So now my question is:
    Is there hope for me to find some position in machine learning which actually requires depth knowledge and understanding of the underling math? Some role where the day to day work is doing deep analysis and problem solving?
    And if there are such roles, how many and how accessible? Do they require a Ph.D.?

    Thank you very much!

    Reply
    • If you’re more interested in difficult or theoretical work, then machine learning will probably be more interesting.

      Breaking into ML will be harder without a PhD, but not impossible. The best path is probably to start as a lower level data scientist doing simpler work, and then network your way into a more advanced role over 3 to 5 years.

      Reply
  8. This is a great article. I have a PhD in Business Econ and have worked with a lot of different types of statistics over the years in a business. I rarely need to go back to the things I did in grad school (I took econometrics from Judge and had an econometrician named Newbold on my committee – he wrote a paper with Granger way back) – the most important skill I have now is understanding things in a business context. I think kids in the US get intimidated by math and statistics and some of the teachers are not much help, but students need to learn some of these skills practically – with applications. While I can see pitfalls to running ML without knowing what you are doing, the ‘knowing what you are doing part’ usually relates to understanding the data and the problem at hand – not to fancy math. So data science could be a positive step for a lot of people who have been convinced they are not good at math, even though they have sufficient math knowledge to make practical contributions.

    Reply
    • Yeah … agree on almost all of this.

      “Understanding” ML almost always means understanding the business problem or task you’re trying to solve; understanding the data; and understanding the heuristics for when to apply certain techniques, regularization, etc.

      Fancy math is very far down on the list of things you need.

      Reply

Leave a Comment