## How much data science do you actually remember?

How many data science books have you read? 5? 10? A few dozen? How many free online courses have you taken? A few? How many blog posts have you read? (I’d be willing to bet: you’ve read dozens.) If you’re like most budding data scientists, you’ve probably consumed a lot of material. You probably even … Read more

## How to use data analysis for machine learning (example, part 1)

In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite for machine learning is data analysis, not math. One of the main reasons for making this statement, is that data scientists spend an inordinate amount of time on data analysis. The traditional statement is that data scientists “spend 80% … Read more

## The real prerequisite for machine learning isn’t math, it’s data analysis

When beginners get started with machine learning, the inevitable question is “what are the prerequisites? What do I need to know to get started?” And once they start researching, beginners frequently find well-intentioned but disheartening advice, like the following: You need to master math. You need all of the following: – Calculus – Differential equations … Read more

## Why you should start by learning data visualization and manipulation

One of the biggest issues that comes up when I talk to people who want to get started learning data science is the following: I don’t know where to get started! Recently, I argued that R is the best programming language to learn when you’re getting started with data science. While this helps you select … Read more

## How data visualizations are tools (and what you’re building with them)

In a recent post, I wrote that when you’re starting out with data, you need to focus much more on process and technique, not syntax. Beginning students hear this, but it’s easy to ignore and get lost down the rabbit whole of syntax.

The problem is …

## Data analysis example with ggplot and dplyr (analyzing ‘supercar’ data, part 2)

For our purposes here, data exploration is the application of data visualization and data manipulation techniques to understand the properties of our dataset.

We’re going to be looking for interesting features: things that stand out, trends, and relationships between variables.