How to use Numpy reshape

NumPy arrays are an important component of the Python data science ecosystem. When working with NumPy arrays, you’re going to need to be able to perform basic data manipulation. In particular, you may need to change the “shape” of the data; you may need to change how the data are arranged in the NumPy array. … Read more

How to do linear regression in R

A visualization of an example linear regression in R, performed using ggplot2.

Linear regression. It’s a technique that almost every data scientist needs to know. Although machine learning and artificial intelligence have developed much more sophisticated techniques, linear regression is still a tried-and-true staple of data science ….

R vs Python … which to learn for data science

One of the most common questions I get from data science hopefuls is “which programming language should I learn?” My general advice is “it depends.” Or to clarify my response, I like to ask the question “who are you, and what are your goals?” The programming language you use depends on your background and your … Read more

How to use mutate in R

If you want to master data science in R, you need to master foundational tools like the mutate() function. Readers here at the Sharp Sight blog will know how much we emphasize “foundational” data science skills. If you want to be effective as a junior data scientist, you need to master the fundamental skills. If … Read more

The most important thing for getting your first data science job

Right now (in 2018) the average salary for a data scientist is over $120,000 per year, according to Glassdoor.   Source: Glassdoor   Considering that the median household income in the US is about $55,000 per year, the average salary for a data scientist represents a pretty large premium. Granted, you won’t start out at … Read more

How to rename columns in R

In this blog post, I’ll show you how to rename columns in R. This is pretty straightforward if you know how to do it properly, but there are also some little challenges in renaming variables. So very briefly, I’ll explain why renaming variables in a dataframe can be a little confusing in R. Then, I’ll … Read more

How to create a substring in R

Substring of characters 1 to 6, which reads "fluent"

If you want to be a data scientist, you need to master core data manipulation tools. One particular skill you’ll need to master is string manipulation. You need to be able to work with strings (i.e. character data) in order to clean, modify, or reshape them. In this blog post, you’ll learn one specific string … Read more

A key for mastering data science

A few days ago, I received an email from a Sharp Sight reader. The author of the email is having trouble learning data science in R. He’s taken several data science courses, but still has trouble with critical data science skills. Here’s an excerpt from his email: I’ve redacted the company names, but I’ll tell … Read more