The post The Best Python Package for Data Visualization appeared first on Sharp Sight.

]]>Although you often hear about the importance of data *manipulation* (i.e., “80% of data science is data manipulation”), data visualization is just as important.

I explained why in a recent blog post about why you need to master data visualization. That blog post has a detailed explanation of why data visualization is important, and I recommend that you read it if you haven’t already.

To summarize though, data visualization:

- Is required for almost every step of the data science workflow
- Is critical at every data science job level, but especially at junior and intermediate levels

Whether you’re doing data exploration, data analysis, finding insights, storytelling with data, or building a machine learning model, you will probably need to use data visualization. So, you need to make learning data visualization a priority.

If you’re a Python user though, you’re going to run into a bit of a problem when you try.

Frankly, data visualization in Python is a pain in the a**.

When I say this, I’m mostly talking about Matplotlib.

Matplotlib is the de facto standard for data visualization in Python. It’s the package that’s used in 90% of the books, videos, and courses that I’ve seen. It’s ubiquitous.

But the fact is, Matplotlib is hard to use for the most part.

Things like simple bar charts and scatterplots are somewhat easy to create, but that’s where the simplicity ends.

Doing relatively simple things like adding multiple lines to a line chart, or creating a “dodged” bar chart (with different bars for different categories) is often confusing.

And slightly more complicated charts, like a small multiple chart, require a lot of confusing syntax.

Additionally, matplotlib doesn’t seem to work well natively with Python dataframes, which are now a standard for data science in Python.

This is in stark contrast to R’s ggplot2. ggplot2, although a little confusing to beginners, is extremely powerful and easy to use once you understand how it works.

Matplotlib, on the other hand, is almost always complicated and confusing. When you use it, things just seem harder than they should be.

In an effort to find something better, I’ve personally looked at other options.

I was initially interested in Bokeh, which is very powerful. But Bokeh is not really great for statistical visualization. Bokeh excels at *interactive* visualizations. It’s really designed to produce visualizations that live on the internet, not necessarily static visualizations that are used for data exploration and analysis.

It also doesn’t work as well natively with Pandas dataframes.

What we really need is a visualization toolkit that’s easy to understand, easy to use, works well with dataframes, and is capable of producing a wide range of statistical visualizations that we can use for data exploration and analysis.

After some research and experimentation, I discovered that such a package exists.

It’s Seaborn.

After quite a bit of experimentation with various Python data visualization packages, I discovered that Seaborn is the best option right now for statistical data visualization in Python.

There are a few major reasons for this.

The biggest reason that I like Seaborn is that it’s easy to use.

In Seaborn, most common statistical visualizations can be created with a simple line or a few lines of code.

For example, if you want to create a scatterplot in Seaborn, once you have your data, the plot itself is very easy to create.

sns.scatterplot(data = norm_data ,x = 'x_var' ,y = 'y_var' )

I wrote that code on 3 lines to make it easier to read, but it could have been 1 line of code.

Notice as well that the parameters make sense.

There’s a `data`

parameter to set the dataframe.

And there’s `x`

and `y`

parameters to set the variables that go on the x-axis and y-axis respectively.

The functions in Seaborn are fairly well designed, easy to understand, and easy to use.

Seaborn also enables you to create a wide variety of statistical data visualizations.

For example, bar charts are very straightforward to create.

So are line charts:

But Seaborn also enables you to create slightly more exotic visualizations.

For example, you can use Seaborn to create violin plots:

Two variable density plots:

Small multiple charts:

As well as “pair plots,” heatmaps, and more.

Seaborn is a full toolkit for creating data visualizations that will help you analyze your data and find valuable insights.

And like I mentioned previously, all of these visualizations are relatively easy to create. Seaborn has simple functions for creating all of them.

Pandas dataframes have become a standard for storing and manipulating data for data science projects.

Because of this, I’d argue that a great data visualization package should work seamlessly with Pandas dataframes.

That’s one of the reasons that I really like Seaborn.

Seaborn works natively with dataframes.

When you study the syntax, you’ll notice that most of the Seaborn functions have a `data`

parameter. Many also have an `x`

and `y`

parameter.

In these Seaborn functions, the `data`

parameter enables you to specify a Pandas dataframe as an input. They work seamlessly with dataframes.

Moreover, the `x`

and `y`

parameters enable you to directly specify dataframe variables (i.e., columns) as inputs to parameters like `x`

and `y`

. This also works for other parameters like the `hue`

parameter which enables you to modify plot colors based on a third variable.

When you start working with Seaborn, you find that it “just works” with Pandas dataframes.

Finally, you can use Seaborn to create beautiful, multivariate visualizations with only a few lines of code.

The “beauty” of these visualizations is important, because it makes them engaging to your audience.

But what’s more important here is that we can use Seaborn’s tools to easily manipulate plot aesthetics like size, opacity, color, and position to create visualizations that naturally produce insights. We can use Seaborn’s functionality to create visualizations that literally enable us to *see* the important features of our data. We can also use these complex visualizations to communicate insights to our partners.

I’ll emphasize again that Seaborn makes this relatively easy to do. Due to the simplicity of the syntax, creating beautiful, insightful visualizations becomes relatively easy once you know how it works.

Seaborn is hands down my favorite data visualization package for Python right now.

It’s the primary tool that I use for statistical data visualization.

And it’s also the only Python data visualization package that I teach to our beginning students (although, at advanced levels, students may want to explore other tools for specific use cases).

If you’re looking for a Python data visualization package that’s easy to use, easy to understand, powerful, and works well with Pandas dataframes, I think that Seaborn is definitely the best option.

If you’re ready to master data visualization in Python using Seaborn, you should join our new course, Seaborn Mastery.

Seaborn Mastery is our premium course to help you master data visualization in Python using the Seaborn package.

We’ve designed this course to be the absolute fastest way to master Seaborn. The course breaks everything down, step by step. It clearly explains all of the techniques. It clearly explains the syntax. And it will give you a practice system that will help you memorize all of the syntax you learn. Ultimately, the course will help you master Seaborn within only a few weeks.

This is a brand new course, and we’ll open enrollment for it on Tuesday September 15.

If you’re ready to master data visualization in Python, this is the course you’ve been waiting for.

The post The Best Python Package for Data Visualization appeared first on Sharp Sight.

]]>The post Why You Need to Master Data Visualization appeared first on Sharp Sight.

]]>All of us have limited time and there is a lot to learn in data science.

Many overzealous students throw caution to the wind, and immediately try to learn advanced topics like machine learning. As I’ve said many times, this is foolish.

Instead, you need to master the basics before you move on to advanced topics.

In past blog posts, I’ve emphasized the importance of data manipulation.

It’s true that data manipulation is a huge part of data science, and is one of the critical foundations.

But I’d also argue that data visualization is just as important.

Along with data manipulation, data visualization is one of the essential pillars of data science.

So why is data visualization so critical?

I’d argue that there are two main reasons that data visualization is so important:

- Data visualization is used at almost all levels of data science
- Data visualization is critical for almost every part of the data science workflow

Essentially, you need data visualization at almost every level of data science job (junior, intermediate, and advanced). And you also need data visualization for almost every step in your day-to-day data science work.

Let’s analyze this a little more closely, so you can understand why you need to master data visualization early in your data science career.

First of all, data visualization is used at all data science job levels.

To explain this, let me give you a rough sketch of three different data science job levels.

- Junior Data Scientist
- Intermediate Data Scientist
- Advanced Data Scientist

To be clear: these are very rough categories. Most companies will have several sub-categories within each of these (e.g., “Junior Data Scientist I”, “Junior Data Scientist II”, etc). Many companies will also have specialist roles related to data science, like database engineer, machine learning specialist, etc. And some unique companies will have very different job categories for data scientists.

However, these three job titles show a very general job progression that you’ll find at many businesses.

Let’s quickly discuss these different job levels, and what they entail. When we do, you’ll see that data visualization is critical for almost all of them.

This level is what I consider to be the “entry level” of a data science team.

At this job level, you’re getting data, cleaning it up, exploring it, visualizing it, analyzing it, and presenting results.

As a junior data scientist, you’re sort of a glorified data analyst. I like to call this type of work, “data analysis on steriods.”

Notice though that data visualization is one of the key skills.

In fact, I’d argue that data visualization is somewhere between 20% and 50% of the work.

We can argue about what percentage exactly (the percent will be different at different companies and in some specific rolls). What’s important is that data visualization is a *critical part of the job*.

As a junior data scientist, you need data visualization.

As an Intermediate Data Scientist, your work will actually look a lot like the work of a Junior Data Scientist.

In many ways, it’s more of the same: you’ll be getting data, cleaning it up, visualizing it, and analyzing it.

There are two major differences though.

First, at this stage, you might start to specialize in something.

Some data scientists are great at getting and cleaning data, so they might specialize in that. Other data scientists are better at data visualization, and specialize in visualization. Others still might have a knack for presentation, and may be tasked more with presentations and client-facing deliverables.

The second difference at this level, is that machine learning commonly comes into play.

While Junior Data Scientists are rarely do much machine learning (in spite of what you might have heard), Intermediate level data scientists might start to build some ML models.

In many cases, these will be simpler types of models like linear regression, logistic regression, clustering, and decision trees. (More advanced techniques like deep learning will be more rare.)

What’s important to recognize, is that at this level, *you’ll still be doing a lot of data visualization*. You might specialize in something, and you might start using a few advanced techniques, but the job still requires a *lot* of data visualization.

If you need to get and explore a dataset, you’ll probably need to use data visualization.

If you need to perform an analysis and “find insights” in your data, you’ll definitely need to use data visualization.

And if you start to do some machine learning, guess what … you’ll still need to use some data visualization to evaluate your models, run diagnostics, and communicate results.

Data visualization is still critical at an intermediate data science job level.

Once you hit an advanced level, the required skills typically shift.

Here, you’re much more likely to be using machine learning techniques, and more “advanced” techniques like deep learning.

It’s also much more likely that you’ll be responsible for building systems and applications. You might also be responsible for optimizing those systems, which require much deeper understanding of software engineering and algorithms.

Having said that, although data visualization is a somewhat smaller proportion of the job at this level, it’s still important.

In almost all cases, you’ll still be getting data and analyzing it, which means that you’ll still need to use both data manipulation and data visualization techniques.

And again, if you’re doing machine learning, you’ll still need to use data visualization to evaluate model performance, run diagnostics, and communicate your results.

As I’ve said: data visualization is a necessary skill at almost every data science job level.

Not only is data visualization critical for different job levels on a data science team, data visualization is used in almost every part of the day-to-day data science workflow.

Almost from start to finish, no matter what you’re doing as a data scientist, you’ll need to use data visualization as either the primary tool, or a supplementary tool to get your work done.

For example, when you first get or create a dataset, one of the first things you’ll need to do is explore the data.

Before you do anything else, you simply need to know what’s in the data.

To find out, you’ll perform “data exploration.”

Often times, we’re trying to answer basic questions like:

- how numeric variables are distributed
- what categories exist in categorical variables
- whether there are any lingering problems with the data that still need to be fixed

At this stage, you’re just trying to figure out what’s in the data.

To do this, you’ll use many simple data inspection techniques, like the Python `print()`

statement and data aggregation techniques.

But you can also use data visualization.

For example, if you want to view the data distribution for a variable, you can create a histogram or density plot (e.g. you can use the Seaborn distplot function).

Similarly, you can create a bar chart to examine categorical variables or use scatterplots to examine pairs of numeric variables.

When you’re doing initial data exploration, data visualization is actually a very useful toolkit.

Similarly, data visualization is critical for data analysis.

Data analysis is somewhat similar to data exploration, in that you’re trying to understand the data.

But when you do data analysis, you’re commonly trying to answer specific questions like “how can we improve sales” or “what is the biggest opportunity for Team A.”

To answer these questions, you’re going to perform data analysis.

At this phase of work, a lot of young data scientists get stuck. Many data science students have no idea how to do “data analysis.”

To help, I’ll let you in on a secret …

For the most part, data analysis is really just data manipulation and data visualization combined together.

Now to be fair, the actual process of combining data manipulation and data visualization to do data analysis is a little more complicated. But at its root, data analysis is mostly just applied data manipulation and data visualization.

So if you want to analyze your data, you need to master data visualization first.

This is a good point to talk about “finding insights” in data.

If you look at almost any data-related job description, it will say that applicants need to be able to “find insights in data.”

WTF does that mean?

Finding insights is really just data analysis. Companies want you to be able to analyze data to find valuable pieces of information that they can use to improve metrics (i.e., profitability, customer retention, system performance, etc).

If you want to be able to “find insights in data,” you really need to be able to *analyze* data. And as I just mentioned, this ultimately means that to find insights, you need to be able to apply data visualization in a strategic way.

Again, data visualization is critical.

Once you analyze your data and find some insights, you commonly need to *communicate* those insights to other people.

Frequently you’ll communicate your results to your colleagues and immediate management.

But many times, you’ll also need to communicate your results to higher level executives.

This group of people are often less technical. They also have less time and need an ultra-distilled message that clearly shows the most important findings or results.

How do you do this?

You use data visualization.

The most common method of communicating upward to an executive team is with a Powerpoint-style presentation with lots of charts and graphs.

In these presentations, you literally need to *show* your audience the most important findings and results.

Data visualizations – like bar charts, line charts, etc – are almost always the best way to communicate quickly and clearly to a management team.

They are easy to create, easy to explain, and usually easy to understand (if you use them properly).

I’d argue that data visualizations are the most important tools for data communication and for “storytelling with data.” You absolutely need data visualizations for this phase of work.

Many data science students think that machine learning is about equations and algorithms, but that’s mostly not true.

It does help to understand the mathematics behind machine learning techniques, but most of the actual computation is performed by the computer.

Moreover, most of the code to perform those computations is abstracted away. For most data scientists, building a machine learning model is as simple as calling a few functions (like the `fit()`

function from Sci-Kit Learn).

But that’s not to say that machine learning is easy. It’s still difficult.

Frequently, you still need to build several different models, compare models, evaluate performance, and run diagnostics.

How do you do this?

In many cases, to evaluate models and diagnose problems, you need to use data visualization techniques. To compare models, you can use some standard techniques like bar charts. To evaluate model performance, there are some more specialized charts like the ROC curve.

These are just a couple of examples. The overall point is that in practice, building machine learning models involves lots of analysis and visualization.

I hope you’re getting the picture.

Data visualization is a foundational skill for data science.

You need it for almost every step of the data science workflow.

You need it at almost every data science job level.

Without data visualization skill, you will probably be completely unqualified to do actual work.

But with it, you’ll not only be qualified to do work at a junior level, but you’ll be prepared to learn more advanced data science skills later.

So before you jump to advanced topics like machine learning, you should master data visualization.

Mastering data visualization should be just as big of a priority as mastering data manipulation.

So study hard, and master data visualization very early in your data science career.

If you’re ready to master data visualization in Python, you should join our new course, Seaborn Mastery.

Seaborn Mastery is our premium course to help you master data visualization in Python using the Seaborn package.

We’ve designed this course to be the absolute fastest way to master Seaborn. It breaks everything down, step by step. It clearly explains all of the techniques. And it will give you a practice system that will help you memorize all of the syntax you learn. Ultimately, the course will help you master Seaborn within only a few weeks.

This is a brand new course, and we’ll open enrollment for it on Tuesday September 15.

If you’re ready to master data visualization in Python, this is the course you’ve been waiting for.

The post Why You Need to Master Data Visualization appeared first on Sharp Sight.

]]>The post Numpy Square, Explained appeared first on Sharp Sight.

]]>The Numpy square function is fairly straight forward, but in the tutorial, we’ll explain everything step by step.

You’ll learn about how the syntax works, and we’ll also show you some clear examples.

**Table of Contents:**

You can click on any of the links above, and it will take you to the appropriate section.

Having said that, to get a full understanding of the function, you should probably read the whole thing.

Let’s start off with a quick introduction to the function.

As you probably know, the Numpy Square function is a function in the Numpy package in Python.

All of the functions in Numpy perform operations or computations on numeric data. That’s essentially what np.square does too.

The Numpy square function calculates the square of a numeric input.

So for any value x that we use as an input to the function, Numpy Square will calculate .

In some ways, it’s similar to some other mathematical functions in Numpy like Numpy power, which computes exponents for input values, or Numpy log, which computes natural logarithms. Numpy square is similar to these other functions in the sense that they share a similar syntactical structure.

Having said that, let’s take a look at the syntax.

Here, I’ll explain the syntax of np.square. But before I do that, I need to make one quick note about importing Numpy.

Remember: whenever we use a Python module, we need to import that module first.

How we import a module will change how we write the syntax.

Having said that, in all of the following examples and explanations, we’ll assume that you’ve imported Numpy with the alias ‘`np`

‘.

To do this, you can run the following code:

import numpy as np

This is the common convention for importing and using Numpy, so we’ll be sticking with that here.

Ok.

As I mentioned previously, the syntax for the square function is very, very simple.

Typically, we’ll call the function as `np.square()`

.

Inside of the parenthesis, there is a parameter `x`

that allows us to specify an argument to the function (i.e., an input on which Numpy will compute the square).

Let’s take a quick look.

`x`

(required)The `x`

parameter enables you to specify the input to the function (i.e., the argument).

The input can be a single number, a Numpy array, or an “array-like” object. Array-like objects are things like Python lists or tuples.

Technically, there are other parameters for the square function, such as `out`

and `where`

.

These are used somewhat rarely, so we will not cover them in this tutorial.

Now that you’ve learned about the syntax, let’s take a look at some examples of Numpy square.

**Examples:**

- Example 1: Calculate the square of a single value
- Example 2: Calculate squares of values in a 1D array
- Example 3: Calculate squares of values in a 2D array

Before you run any of these examples, you’ll need to import Numpy.

As explained in the syntax section of this tutorial, how we import Numpy will impact the exact syntax.

The common convention among Python data scientists is to import Numpy with the alias ‘`np`

‘, and we’ll be using that convention here in these examples.

So before you run these examples, make sure that you import Numpy with the following code:

import numpy as np

Ok. Now that you’ve done that, let’s get started.

First, we’ll start with a very simple example.

Where, we’ll calculate the square of the value 5.

np.square(5)

OUT

25

This is very simple.

When we use Numpy square and provide a single number as the argument to the function, np.square simply computes the square of that value.

So when we use the value 5 as the input, np.square computes , which equals 25.

Next, we’ll compute the square of the values in a 1-dimensional Numpy array.

To do this, we’ll first create a simple Numpy array, and then we’ll use that as an input the Numpy square.

Here, we’ll create a very simple Numpy array using the Numpy array function.

array_1d = np.array([0,1,2,3,4])

Now that we have an array, we can run Numpy square.

np.square(array_1d)

OUT:

array([ 0, 1, 4, 9, 16])

Again, this is fairly straight forward.

When we provide a 1-dimensional array to Numpy square, np.square computes the square of every element.

Finally, let’s use Numpy square on a 2-dimensional array.

This really works the same as computing the values in a 1D array. The only major difference is the structure of the input.

First, we’ll create a 2-dimensional Numpy array.

We’ll create an array with the values from 1 to 9, arranged in a 3-by-3 shape.

To do this, we’ll use the Numpy arange function in combination with the Numpy reshape method.

array_2d = np.arange(start = 1, stop = 10).reshape((3,3))

Now that we have our 2D array, let’s use Numpy square.

Here, we’re simply going to provide the name of the array, `array_2d`

, as the input to `np.square()`

.

np.square(array_2d)

OUT:

array([[ 1, 4, 9], [16, 25, 36], [49, 64, 81]])

Again, this is fairly straight forward.

When we provide a Numpy array as an input to np.square, thefunction will complete the square of every value of the array, element wise.

Do you have questions about Numpy square?

Leave your questions in the comments section below.

The examples here about Numpy square are pretty simple and easy to understand.

But other parts of Numpy can be a lot more complicated.

If you’re serious about learning Numpy (and serious about data science in Python), you should consider joining our premium course called *Numpy Mastery*.

Numpy Mastery will teach you everything you need to know about Numpy, including:

- How to create Numpy arrays
- How to use the Numpy random functions
- What the “Numpy random seed” function does
- How to reshape, split, and combine your Numpy arrays
- How to perform mathematical operations on Numpy arrays
- and more …

Moreover, it will help you completely *master* the syntax within a few weeks. You’ll discover how to become “fluent” in writing Numpy code. If you have trouble remembering Numpy syntax, this is the course you’ve been looking for.

Find out more here:

Learn More About Numpy Mastery

The post Numpy Square, Explained appeared first on Sharp Sight.

]]>The post Numpy Absolute Value, Explained appeared first on Sharp Sight.

]]>Ultimately, you’ll learn how to compute absolute values with Numpy.

We’ll start with a brief overview of the function, and then work with some examples.

I recommend that you read the whole tutorial, but if you’re looking for something specific, you can click on any of the following links to navigate to a specific section.

**Table of Contents:**

Let’s start with a quick overview of what Numpy absolute value does.

Numpy absolute value calculates absolute values in Python.

It can do this with single values, but it can also operate on Numpy arrays.

Let’s quickly review what Numpy arrays are, just for context.

A Numpy array is a data structure in Python that contains numeric data.

These arrays can be 1 dimensional, like this:

Or they can be two dimensional:

Numpy arrays can also be multi-dimensional.

We can create Numpy arrays using a variety of techniques, like numpy zeros, Numpy empty, Numpy randint, numpy arange, and other techniques.

We often use Numpy arrays for anything involving computations involving strictly numeric data, like scientific computing and some forms of machine learning.

(For a more complete explanation, see our tutorial on Numpy arrays.)

So what does Numpy absolute value do?

Put simply, Numpy absolute value calculates the absolute value of the values in a Numpy array.

So let’s say we have some numbers in an array, some negative and some positive.

If we apply Numpy absolute value, it will calculate the absolute value of every value in the array. The output of the function will be a new array with the absolute values.

It’s actually a very simple function, much like the other mathematical functions in Numpy like Numpy power, Numpy exponential, and Numpy log.

So now that you’ve learned about what Numpy absolute value does, let’s take a look at the syntax.

The syntax is actually very simple, but before we get into the syntax, there’s an important thing that I need to point out.

Whenever we use Numpy, we need to import the module.

How we import the module will have an impact on the syntax, so I need to specify how we will import it.

The common way of importing Numpy is with the following code:

import numpy as np

When we import Numpy with the alias `np`

, we can use this prefix when we call our Numpy functions like Numpy absolute value.

This is by far the most common way to import Numpy, and it’s the syntactical convention that we’ll be using going forward.

The syntax of Numpy absolute value is extremely simple. It’s possibly one of the most simple Numpy functions.

We can call the function as `np.abs()`

. As noted above, this assumes that we’ve imported Numpy as `np`

.

Alternatively, we can call the function as `np.absolute()`

. The functions `np.absolute()`

and `np.abs()`

are essentially the same. The `np.abs()`

function is essentially a shorthand version `np.absolute()`

. You can choose which ever one you like.

Inside of the function, we need to provide an input on which to operate.

Let’s quickly talk about that.

The numpy.abs function really only has one commonly used argument, the `x`

argument.

`x`

(required)The `x`

argument enables us to specify the input array.

The argument that we provide here can be a Numpy array, but also an array-like object, such as a Python list.

Keep in mind that you are *required* to provide something here. You need to provide some type of input.

Also note that you do not use `x`

this explicitly as a parameter.

What I mean is that you do not explicitly include `x`

in your syntax; the code `np.abs(x = myarray)`

will produce an error. It is implied that any input you provide to the function will serve as an argument to the `x`

parameter.

The np.absolute function does have some other parameters, including:

- out
- where

These parameters are somewhat uncommonly used, so we will not discuss them in this tutorial.

Okay. Now that we’ve looked at the syntax, let’s take a look at some examples.

**Examples:**

- Example 1: Calculate absolute value for a single value
- Example 2: Calculate absolute values for a 1D array
- Example 3: Calculate absolute values for a 2D array

Before you run any of the examples below, you’ll need to import Numpy properly.

You can use the following code to import Numpy:

import numpy as np

This will enable us to use the alias “`np`

” in our code for the Numpy module.

Ok, let’s start with a very simple example.

Here, we’re going to compute the absolute value of a single number.

np.abs(-5)

OUT:

5

This is extremely simple.

Here, the input to the Numpy absolute value function is a negative number, `-5`

. When we provide this value as the argument to the function, np.abs() simply computes the absolute value, which is 5.

To be clear, you could also run this code as `np.absolute(-5)`

. It’s effectively the same as `np.abs(-5)`

.

Next, we’ll do a slightly more complicated example.

Here, we’ll compute the absolute values of an array of values.

To do this, we first need to create an array, and then we can use `np.abs()`

on that array.

First, we’ll create a 1-dimensional array that contains the values from -2 to 2.

To do this, we’ll use the Numpy arange function.

array_1D = np.arange(start = -2, stop = 3)

And we can print it out to see the values.

print(array_1D)

OUT:

[-2 -1 0 1 2]

Obviously, this array contains two negative values.

Next, we’ll apply the `np.abs()`

function to compute the absolute values.

np.abs(array_1D)

OUT:

array([2, 1, 0, 1, 2])

When we use np.abs() on the array `array_1D`

, it computes the absolute value of every value in the array.

The output is a new array that contains those absolute values.

Finally, let’s use Numpy absolute value on a 2-dimensional array.

This is very similar to example 2, where we operated on a 1D array.

First, we’ll create a 2-dimensional array with the Numpy arrange function, and the Numpy reshape method:

array_2D = np.arange(start = -4, stop = 5).reshape((3,3))

And let’s print it out:

print(array_2D)

OUT:

[[-4 -3 -2] [-1 0 1] [ 2 3 4]]

This is a simple 2D array that contains the values from -4 to 4, arranged in a 3 by 3 shape.

Next, we can apply np.abs to the 2D array.

np.abs(array_2D)

OUT:

array([[4, 3, 2], [1, 0, 1], [2, 3, 4]])

This is really simple.

The numpy absolute value function simply operates on every element of the 2D array, element wise.

The output is a new Numpy array that has the same shape. The values of the new Numpy are the absolute values of the values of the input array.

Now that we’ve looked at a few examples, let me answer a common question about np absolute value.

**Frequently asked questions:**

The most common question about Numpy absolute value is whether you should use `np.abs()`

or `np.absolute()`

.

The short answer is: it doesn’t matter.

`np.abs()`

is a shorthand version of `np.absolute()`

. They are effectively the same function, and work the same way.

Pick one version, and move on.

Do you have other questions about Numpy absolute value?

Leave your questions in the comments section at the bottom of the page.

The examples you’ve seen in this tutorial should be enough to get you started, but if you’re serious about learning Numpy, you should enroll in our premium course called *Numpy Mastery*.

There’s a lot more to learn about Numpy, and *Numpy Mastery* will teach you everything, including:

- How to create Numpy arrays
- How to use the Numpy random functions
- What the “Numpy random seed” function does
- How to reshape, split, and combine your Numpy arrays
- How to perform mathematical operations on Numpy arrays
- and more …

Moreover, it will help you completely *master* the syntax within a few weeks. You’ll discover how to become “fluent” in writing Numpy code.

Find out more here:

Learn More About Numpy Mastery

The post Numpy Absolute Value, Explained appeared first on Sharp Sight.

]]>The post Why You Need to be a Top Performer appeared first on Sharp Sight.

]]>I think that the industry is going to split – and in fact, it probably already has.

This split that I’m referring to is the divide between the top performers – the top few percent – and everyone else.

The the truth is that this divide already exists to some extent. If I had to guess, out of everyone who has ever started studying data science, only about 20% will ever get paid to use those data skills.

There are a lot of reasons for this.

Some people decide that data science is too hard and they give up.

Other people lose interest and focus their efforts elsewhere.

But one of the big reasons is that many people simply never develop enough skill.

It’s like many other skill areas, like sports or music. For example, many people learn basketball and play a basketball as sort of a hobby, but the vast majority of people will never be skilled enough to be paid for it.

Granted, basketball is an extreme example. And the market for data scientists is larger than the market for basketball players. Still, the principle is the same. There’s a relatively small percent of would-be data scientists who succeed, and a large percent who struggle, fail, or drop out.

You need to decide which you’ll be.

The divide between the top performers and everyone else is already big. As I said, I suspect that only about 20% of people who study data science will ever get jobs in the field.

But I think that divide will get larger.

At least two things are driving this: layoffs and new technology.

Right now in August of 2020, we’re all dealing with the covid crisis.

The economic effects of covid haven’t hit tech companies quite as hard, but it has impacted some other industries who employ tech workers (e.g., retail).

Ultimately, many companies have experienced a drop in revenue, and many will use this as a reason (or excuse) to lay some people off.

I hate to break this to you, but the lowest skilled people are probably going to be hit by this the most.

In times like these, companies keep the top performers and get rid of the lowest performers. That’s just the way that it is.

Another big trend that’s just getting started is the use of AI systems to replace white-collar workers.

Most recently the GPT-3 AI system from OpenAI has shown the ability to generate working code based on English language input from a user. The user can describe what they want in English, and GPT-3 can create an application *and working code* based on the English language description.

The obvious question is whether or not these AI systems will replace programmers and data scientists.

I think that the answer is a little complicated.

AI-based systems will replace *low skill* programmers.

If you’re a low-skill, cut-and-paste coder, GPT-3 and similar systems may replace you. These systems will be able to write code faster and more accurately than lower-tier programmers by empowering people with zero experience. 23 year old marketing managers or will be able to generate their own code with the help of AI. They won’t need low-skill coders … they’ll just do it themselves.

But the high skill coders will be a different story.

Why?

For the highest skilled people, AI systems will be a *productivity tool*.

For a person who already has high levels of skill and understanding, they will simply get more done.

AI systems will empower the top 10 to 20% and enable them to write more code, faster.

For highly skilled people, AI systems will enhance their skill. These AI systems and other tools will be *productivity tools* that will make the best people even better.

With emerging tools and technologies, the best data scientists will become even more valuable.

Ultimately, what I want you to understand is that the market for data scientists will possibly split in two over the next few years.

New tools and technology will likely empower the best people to create massive amounts of value … and they’ll be rewarded with either high salary or high business valuations.

The lowest skilled people though are going to face ever more fierce competition, which will drive down wages.

You need to choose which you’ll become: a top-tier data scientist that succeeds wildly, or one of the bottom 80% that struggle endlessly.

The post Why You Need to be a Top Performer appeared first on Sharp Sight.

]]>