GPT is Not a Shortcut to Learning Data Science

A couple of days ago, I wrote a blog post about how GPT writes bad Pandas code.

If you’ve been reading at the blog for a while, you’ve probably realized that I have mixed feelings about AI code generators.

On one hand, they can save you lots of time by writing large amounts of code quickly. And they will also help you solve some problems that you previously didn’t know how to solve.

But on the other hand, AI coding systems like GPT often write bad code. Sloppy code. Code that may mislead you about best practices and how to solve problems in a clear way.

And in this line of thinking, reader John posted this insightful comment under the most recent blog post:

An image of a comment about GPT code, by a reader named John.

Let’s give John a round of applause, because he made several good points here:

GPT writes unreadable Pandas code
GPT is only usable if you know what you’re doing already
GPT will cause beginners to develop bad habits
With GPT, the good programmers will become great and the bad programmers will be replaced

And he sums it up by saying, “there are no shortcuts to becoming a great Data Scientist.”

Yes.

Hell yes.

Let’s quickly discuss each of these points.

GPT writes unreadable pandas code

I’ve noted this several times, but it bears repeating: GPT writes bad Pandas code.

Ugly code.

Unreadable code.

Code that’s hard to work with and debug.

Now I know what some of you tasteless beasts are thinking: “WhO cARes iF tHE CoDe is uGLy?”

Please. Have some decency.

We have standards here.

I sort of mean that literally.

I teach my students certain coding standards.

In particular, I teach my students to write clean code. Code that’s easy to read, easy to understand, easy to work with, easy to debug, and easy to share.

Clean code is asset to you and your team because it reduces bugs and reduces time spent fixing (or even understanding) the code.

In contrast, hard-to-read, hard-to-work-with code is a form of technical debt. And you “pay” that debt with time time spent trying to understand it, or with bugs that cause problems (which you inevitably need to fix).

As I’ve shown in several past blog posts, however, GPT writes ugly, unreadable code.

And this bad GPT code that you don’t understand will eventually impose costs on you and your team.

GPT is only usable if you know what you’re doing already

Related to the problem of code ugliness is the usability and ability to understand what it’s doing.

Let me tell you a quick story.

A while ago, I started learning a foreign language.

In the beginning, I had learned some words and phrases, but my knowledge was still limited.

At one point early on, I decided to read a novel in that language.

It didn’t go well.

As I tried to read the book, I could understand a few things here and there, like a few specific words.

But through most of it, I was confused.

Most of the time, I struggled to understand full sentences. And the plot? It was almost impossible for me to understand.

Now ask yourself: if I struggled to read something in that language, do you think I could write in that language?

Of course not.

My limited skill in that language was a hindrance for doing anything of substance, from reading complex materials, to writing in that language.

And it’s worth noting that if you can’t read or write in a language, it will be difficult for you to evaluate someone else’s work in that language.

Something similar will be true for many people using GPT to write code.

Many of these people will have limited understanding of the code. They will have limited ability to read or write in that programming language. And in turn, they will have limited ability to appraise the GPT code and evaluate it for potential mistakes.

Said differently, AI written code will be hard to use properly unless you already know what you’re doing.

GPT will cause beginners to develop bad habits

In the last section, I noted that AI coding systems will cause problems for people who don’t know much about the programming language.

But what about people who actively want to learn the language?

For people with some willingness to learn, GPT will cause a different problem.

Some people have noted that GPT systems and LLMs will be excellent tutors. They will be able to teach you almost anything you want to learn.

In principle, this is true of programming and data science.

But in practice, using GPT systems to learn data science programming is likely to cause problems.

Because the code is bad (as I’ve already noted), if you use GPT and AI coding systems to learn how to code, you’re going to pick up lots of bad habits.

Again: a perfect example of this is the most recent blog post that I wrote about GPT writing bad Pandas code.

In that blog post, you can see me arguing with GPT (and being a bit of an a**hole), because GPT repeatedly made the code more complicated than it needed to be.

Additionally, if you read the end of the blog post, you’ll see that there was a much simpler solution to the problem that I developed myself, without the aid of GPT. This “better” solution was simple, straightforward, and easy to understand. But to develop this solution myself, I needed expertise and experience with Pandas.

All of GPT’s solutions were complex and unwieldy, and the best solution (the type of solution you need to learn) was something that an expert human coder created.

Ultimately, GPT’s Pandas style is the exact style you need to avoid.

So if you use GPT (or similar AI systems) to learn code, you’re going to pick up many bad habits.

It might seem like a shortcut to learning data science syntax, but in the end, GPT systems will turn you into a bad data science programmer.

With GPT, the good programmers will become great and the bad programmers will be replaced

This final point is related to the others.

If you use GPT to learn data science and programming, you’ll pick up bad habits.

If you use GPT to write your code without understanding that code, it will cause problems in your scripts.

If you use GPT to write your code, GPT will write code that’s unreadable and hard-to-use.

In the end, GPT will make you a sloppy, low value-add data scientist.

But why should any company pay you hundreds of thousands of dollars for bad code and bad outcomes?

The question becomes even more important considering that GPT will empower almost everyone with these skills.

Why should a company pay you big money to write sloppy code, when they could outsource your job to someone cheaper who’s using the same AI system?

Answer: They won’t pay you big money to write sloppy code, but they WILL outsource your job to someone else using the exact same AI system.

The only way to survive this new AI coding era is to either:

accept rock bottom compensation for your services, or
become a 10X data scientist (because you already know what you’re doing and are augmented with AI)

AI will undercut bad or mediocre performers.

But AI will augment people who are already good, and make them great.

If you want to be in that later group, you need to learn data science first, and then augment yourself with AI.

You Need To Learn Data Science Programming

Ultimately, relying on GPT systems to write your code will cause problems.

They will:

write bad code
be usable mostly to people who already know what they’re doing
cause you to develop bad habits
undercut your ability to earn (unless you’re one of the top percent who become 10X data scientists)

I still believe that there will be big opportunities in data science and Tech, but only if you really know what you’re doing.

There are no shortcuts.

If you want to have a great career in data science, you need to study and master the field first, largely without GPT.

Leave your Comments

Do you agree that AI coding systems will cause problems for people who don’t know what they’re doing, but will augment people who already have some skill?

I want to hear your opinion.

Leave your comments in the comments section at the bottom of the page.

4 thoughts on “GPT is Not a Shortcut to Learning Data Science”

Pablo Balonga

June 22, 2023 at 2:14 PM

GPT is just starting so it is risky to use ok, your points are straight forward and clear.
But the point missed is that this happens with Panda , not with other, because Panda is unclear and twisted : mainly it is never clear when Pandas grammar would accept the dot form or the bracket form for the fields in a record; they seem and should be interchangeable but they are not.
A critic and reformulation of Pandas “language” is in order too.
- Joshua Ebner
  
  June 22, 2023 at 3:53 PM
  
  1. The problem isn’t with Pandas. There’s a style of Pandas coding that fixes the problems that I’ve shown. I’ve demonstrated this Pandas style over and over here at the Sharp Sight blog. The problem is that 95% of people write bad Pandas code, and these LLMs have all been trained on that bad Pandas code.
  
  2. There are problems other than with Pandas. If you ask it to do data visualization, it will often give you bad data visualization code. For example, I asked it to create a particular type of visualization recently, and it did give me some code to create the visualization, but the syntax was full of lots of unnecessary syntax that I didn’t need or ask for. I attempted to re-prompt GPT multiple times, and it kept giving me bad code stuffed with extra syntax that was unnecessary for the visualization I asked for.
  
  Pandas and visualization are just two examples …. I’ve seen other examples where GPT wrote bad code.
  
  Right now, it’s a systemic problem.
John

June 22, 2023 at 5:37 PM

Thanks for the shoutout and elaboration! Some great points were made.
Fab

July 3, 2023 at 6:27 PM

Hi Joshua,

I would like to enroll to your course mastery pandas, but I didn’t received the email with the next steps.