{"id":7048,"date":"2023-10-02T18:00:07","date_gmt":"2023-10-02T23:00:07","guid":{"rendered":"https:\/\/www.sharpsightlabs.com\/?p=7048"},"modified":"2024-02-06T15:02:00","modified_gmt":"2024-02-06T21:02:00","slug":"plot-roc-curve-in-python-seaborn","status":"publish","type":"post","link":"https:\/\/www.sharpsightlabs.com\/blog\/plot-roc-curve-in-python-seaborn\/","title":{"rendered":"The Best Way to Plot an ROC Curve in Python"},"content":{"rendered":"
This tutorial will show you how to plot an ROC curve in Python using the Seaborn Objects visualization package.<\/p>\n
The tutorial is divided into sections for easy navigation, so if you need anything specific, just click on the appropriate link below.<\/p>\n
Table of Contents:<\/strong><\/p>\n <\/a><\/p>\n I recently published a rather extensive article on ROC curves<\/a>, but here, let’s start with a quick review of what ROC curves are.<\/p>\n In machine learning classification systems, ROC curves are an indispensable tool for model evaluation. In particular, we use them with binary classification (although, there are some ways to use ROC curves with multiclass classification as well).<\/p>\n Put simply, an ROC curve visualizes the True Positive Rate and False Positive rate for a classifier, across a range of classification thresholds.<\/p>\n <\/p>\n Remember: many classifiers, and probabilistic classifiers in particular, produce a “probability score” that measures the classifier’s confidence that a particular example belongs to the positive class. These classifiers compare the probability score to a threshold<\/a>, and this determines which examples will be predicted as positive, and which will be predicted as negative. <\/p>\n Moreover, because we can move the threshold, the number of positives and negatives depends on the choice of threshold.<\/p>\n The ROC curves helps us understand the tradeoff between detecting True Positives<\/a> and avoiding False Positives<\/a>, as we change the classification threshold. <\/p>\n And in doing this, they ultimately illustrate the ability of a classifier to differentiate between the positive and negative for different threshold values. <\/p>\n Ok. So ROC curves help us visualize True Positive rate and False Positive rate.<\/p>\n By why do we need them exactly?<\/p>\n Once again: The power of ROC curves lies in their ability to depict how well a classifier distinguishes between positive and negative classes across a range of thresholds<\/em>. <\/p>\n So in essence, they show us how the model will perform across range of threshold settings, and this in turn can help us choose the right operating point (i.e., the right threshold setting) given the goals and needs of our system.<\/p>\n In some systems, you might want to minimize False Positives, and want a high threshold. <\/p>\n In other systems, you might want to maximize True Positives, and decide to set a lower threshold. <\/p>\n In either case though, an ROC curve will show you the tradeoff for those metrics for your classifier, as you move the threshold. And in turn, that will help you pick the best threshold for your needs, given those tradeoffs.<\/p>\n Generally speaking, a classifier that performs well will have an ROC curve that hugs the top left corner. This indicates that the model can achieve high True Positive rate and with a low False Positive rate. <\/p>\n <\/p>\n Alternatively, a classifier that performs poorly will have an ROC curve that hugs the diagonal line from the bottom left hand corner to the upper right.<\/p>\n <\/p>\n To be clear though, ROC curves can be a complicated topic, and I’m leaving out some important details. For a more in-depth explanation, you should read our blog post ROC Curves, Explained<\/a>.<\/p>\n In the rest of this blog post, I want to show you how to create<\/em> an ROC curve in Python, with my favorite Python visualization package.<\/p>\n <\/a><\/p>\n In this tutorial, I’m going to show you how to plot an ROC curve in Python.<\/p>\n Specifically, we’re going to plot an ROC curve using the Seaborn Objects visualization package<\/a>.<\/p>\n But doing that will require several steps.<\/p>\n We need to:<\/p>\n Let’s take those one at a time.<\/p>\n If you already have some ROC curve data \u2013 meaning true positive and false positive metrics \u2013 then feel free to skip to the section where we plot the ROC curve with Seaborn<\/a>.<\/p>\n First, we need to import several packages.<\/p>\n We’re going to plot the ROC curve<\/a> with the Seaborn objects system, so we obviously need to import that.<\/p>\n But before we plot it, we need to generate the data for the ROC curve itself, which will require several steps. Specifically, we need to:<\/p>\n That being said, we also need to import <\/a><\/p>\n Next, we need to make the data for our Python ROC curve.<\/p>\n Remember: when we plot an ROC curve, we plot the false-positive-rate vs the true-positive-rate. <\/p>\n In other words, we need to generate data for false positive rate and true positive rate for a classification model.<\/p>\n There are many steps to this. <\/p>\n In this case, we need to:<\/p>\n In the interest of brevity, and because this tutorial is about plotting<\/em> an ROC curve, I’m going to refrain from explaining each step. You can click on any of the above links to understand each step.<\/p>\n Once you run this code, you’ll have the true positive data ( <\/a><\/p>\n Here, we’re going to use Seaborn to plot the our Python ROC curve.<\/p>\n Remember: when we plot an ROC curve, we plot the false positive rate vs the true positive rate.<\/p>\n That means here, we’re going to plot And to do this, we’ll use the Seaborn Objects visualization system<\/a> to create a line chart<\/a> from our true positive\/false positive data.<\/p>\n Let’s run the code, and then I’ll explain.<\/p>\n OUT:<\/p>\n <\/p>\n So what did we do here?<\/p>\n Here, we’ve used the Seaborn Objects system to plot both the ROC curve for the model (in red) as well as the diagonal reference line (in blue).<\/p>\n Syntactically, we’re using the Seaborn Objects The first call to If you want to learn more about the line charts with Seaborn Objects, you should read our tutorial about Seaborn Objects line plots<\/a>.<\/p>\n <\/a><\/p>\n One last closing note.<\/p>\n I recommend using Seaborn for almost all of your data visualization in Python.<\/p>\n And in particular, I recommend it for your data visualization tasks related to machine learning in Python.<\/p>\n The new Seaborn Objects system<\/a> has a very easy-to-use syntax.<\/p>\n It’s much easier to use than Matplotlib.<\/p>\n It’s easier to use than Plotly.<\/p>\n And it’s extremely powerful, one you know how to use it.<\/p>\n I’ve started doing 98% of my visualization with Seaborn Objects<\/a>, and I recommend that you do the same. <\/p>\n So it’s not only great for making an ROC curve in Python machine learning, but also a variety of other ML visualizations.<\/p>\n\n
A Quick Review of ROC Curves<\/h2>\n
Why We Need ROC Curves<\/h3>\n
Good and Bad ROC Curves<\/h3>\n
Plot an ROC Curve in Python using Seaborn Objects<\/h2>\n
\n
Import Packages<\/h3>\n
\n
make_classification<\/code>,
train_test_split<\/code>,
LogisticRegression<\/code>, and
roc_curve<\/code> from Scikit Learn in order to do all of those things. <\/p>\n
\r\n# roc curve and auc\r\nfrom sklearn.datasets import make_classification\r\nfrom sklearn.linear_model import LogisticRegression\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn.metrics import roc_curve\r\n\r\nimport seaborn.objects as so\r\n<\/pre>\n
Make Data for ROC Curve<\/h3>\n
\n
\r\n# GENERATE CLASSIFICATIONDATASET WITH 2 CLASSES\r\nX, y = make_classification(n_samples=1000, n_classes=2, random_state=1)\r\n\r\n\r\n# SPLIT DATA INTO TRAIN\/TEST\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)\r\n\r\n\r\n# GENERATE DATA FOR 45-DEGREE LINE\r\nnoskill_probabilities = [0 for number in range(len(y_test))]\r\n\r\n\r\n# BUILD & FIT CLASSIFICATION MODEL\r\nmy_logistic_reg = LogisticRegression()\r\nmy_logistic_reg.fit(X_train, y_train)\r\n\r\n\r\n# PREDICT PROBABILITIES FOR TEST SET\r\nprobabilities_logistic_reg = my_logistic_reg.predict_proba(X_test)\r\n\r\n\r\n# KEEP PROBABILITIES FOR ONLY THE POSITIVE OUTCOME\r\nprobabilities_logistic_posclass = probabilities_logistic_reg[:, 1]\r\n\r\n# CALCULATE DATA FOR HORIZONTAL LINE\r\nfalseposrate_noskill, trueposrate_noskill, _ = roc_curve(y_test, noskill_probabilities)\r\n\r\n# CALCULATE DATA FOR ROC CURVE (Logistic Regression Model)\r\nfalseposrate_logistic, trueposrate_logistic, _ = roc_curve(y_test, probabilities_logistic_posclass)\r\n<\/pre>\n
trueposrate_logistic<\/code>) and false positive data (
falseposrate_logistic<\/code>) that you need for the Python ROC curve.<\/p>\n
Plot the ROC Curve with Seaborn Objects<\/h3>\n
falseposrate_logistic<\/code> vs
trueposrate_logistic<\/code>.<\/p>\n
\r\n# PLOT: SEABORN OBJECTS\r\n(so.Plot()\r\n .add(so.Line(color = 'red'),x =falseposrate_logistic, y = trueposrate_logistic)\r\n .add(so.Line(color = 'blue',linestyle = 'dashed'),x = falseposrate_noskill, y = trueposrate_noskill)\r\n .layout(size = (8,5))\r\n)\r\n<\/pre>\n
Explanation<\/h5>\n
.add()<\/code> method to add each line.<\/p>\n
.add()<\/code> adds the red ROC curve, and the second call to
.add()<\/code> adds the dashed blue reference line.<\/p>\n
Closing Note: Use Seaborn for ML Visualization<\/h3>\n