Introduction to Shot-Based Learning (Guest Post for Patrick Kinter)

If you’ve worked in AI and machine learning, you probably know that getting good training data is one of the core problems with building good models.

And in many cases, we need a lot of training data to build models that generalize well, meaning, models that perform well both during training, but also on new examples once the model is deployed.

But what if you could train a model with only a few examples? Or even one example? Or without any new training examples at all?

You can, under the right conditions.

In this article, I’m going to explain a family of techniques called “shot-based” learning which enable us to build accurate models with minimal training data.

Table of Contents:

What is Shot-Based Learning?

Let’s start off by answering the high-level question: what is shot-based learning.

Put simply, shot-based learning is an approach for training machine learning and AI models, where we use a very small number of examples – which we call “shots” – to train the model (or more precisely, “shots” are the number of training examples per class).

Within this approach to model training, we have specific types of shot-based learning – such as zero-shot, one-shot, and few-shot. These specific techniques refer to the exact number of examples that we use to train the model, although with zero-shot, we’re not exactly using new examples to train the model. Instead, with zero-shot learning, we’re leveraging the existing abilities of the model to generalize to new, previously unseen classes. I’ll write more about zero-shot, one-shot, and few-shot a little later in the article.

To help you understand this family of techniques further, let’s discuss why we need shot-based learning.

That will help you understand the technique more, which will frame-up the use cases and further details about this methodology.

Why Do We Need Shot-Based Learning

So why do we even need shot-based learning?

In traditional machine learning we often need a lot of training data.

Even with very simple methods, like linear regression and logistic regression, we often need a large amount of training data. And typically, the more complex the problem, the more training data we need.

If we have an insufficient amount of data, machine learning models often suffer from problems like overfitting, where a model performs very well in training, but fails to perform well with new test examples or in production (i.e., a lack of data typically causes a failure to “generalize”).

So getting enough high-quality data, well-labeled data is often one of the top issues with building machine learning and AI systems.

But what if you could train a model with only a few examples?

Or one example?

Or even zero new examples?!

Zero new training examples?!

Well, it’s possible, under the right conditions.

… if you use shot-based learning.

Shot-Based Learning Allows You to Train a Model with Limited Data

At its core, shot-based learning enables you to build models that can generalize well when they are trained with minimal data.

So shot-based learning is particularly useful in situations where high-quality, labeled data is expensive, scarce, or time consuming to acquire.

For example, in medical diagnostics, we can use shot-based learning to train models to detect rare diseases, where we might only have data examples for a handful of cases.

Or in natural language processing (NLP), we can imagine a situation where we’re trying to perform an analysis on a rare language or dialect, where again, there are limited training examples.

In such cases, we can use shot-based learning to train models that will perform accurately, even with very limited data.

The 3 Main Types of Shot-Based Learning

As I suggested above, there are three main types of shot-based learning:

  • few-shot learning
  • one-shot learning
  • zero-shot learning

An image of a table that explains the high-level differences between the different types of shot-based learning (few-shot, one-shot, and zero-shot).

Let’s discuss each of these, one at a time.

Few-Shot Learning

Few-shot learning is a case where we train a model with a small number of examples for every class (i.e., a “few” examples).

So in few-shot learning, we’ll commonly have between 2 and 10 examples per class.

To do this, we typically need to use meta-learning techniques or prior knowledge inside the model to enable the model to generalize effectively.

Few-shot learning is best used in situations where data examples are scarce, but we might be able to get a handful of high-quality labeled examples, such as medical diagnostics for a rare disease.

One-Shot Learning

One-shot learning takes the idea behind few-shot a step further and trains the model on only one example per class.

In spite of the extremely limited data, with one-shot learning, we still need the model to be able to classify new examples correctly (i.e., accurately), and we often need to use specialized techniques to make this work (which I will explain further down in the article).

One-shot learning is important in tasks where getting multiple examples is extremely difficult, and where we only have a single example on which to train the model. Facial recognition is an example of a task where we might use one-shot learning.

Zero-Shot Learning

Finally, zero-shot learning takes the concept of shot-based learning to its extreme by using zero instances of the target class to train the model.

Said differently, in zero-shot learning, we expect the model to properly classify examples for classes that were absent in the training data. In zero-shot, the model needs to accurately classify examples for classes that it has never seen before!

If you know anything about machine learning, accurately classifying examples that were absent in the training is obviously difficult, and it requires special tools and techniques.

So with zero-shot learning, we need to train the model on other similar classes that have similar features and attributes to the target class (for which we lack explicit training examples).

Said differently, zero-shot learning first learns about classes and relationships between classes from an initial dataset using techniques like pre-training (to learn general features, attributes, and relationships in the data space) or using knowledge transfer from related tasks. These pre-training and knowledge transfer techniques subsequently help the model generalize to new classes that were absent in the training data.

For example, we might train a computer vision model on dogs, cats, and wolves, and then attempt to use the shared attributes learned by that model (like legs, ears, fur, etc) to generalize to foxes. This can work because the features learned in the training data (legs, ears, fur) exist in both the training classes (dogs, cats, wolves) and the new, previously unseen class (foxes).

So again: with zero-shot learning, we train the model on a different set of tasks or data, and then expect it to generalize to new classes without seeing examples from those classes.

In terms of applications, we often see zero-shot in tasks like NLP, where we can use zero-shot learning to enable a model to understand words or concepts without explicit training examples for those words or concepts.

Applications of Shot-Based Learning

Now that we’ve discussed what shot-based learning is and some techniques we can use to implement shot-based learning, let’s look at some high-level applications.

There are a variety of ways we can use shot-based learning, but we’ll look at a few specific applications in business and industry, namely in:

  • Healthcare
  • Marketing
  • Customer Service
  • Industrial applications
  • Natural Language Processing
  • Finance

Let’s look at each of these one at a time.

Healthcare: Identifying Rare Diseases

One of the most important uses of shot-based learning is in the area of medical imaging and diagnostics.

In particular, we can use shot-based learning to identify rare diseases where there is limited data and few examples on which to train a model.

In this space, we can use few-shot techniques to train classification models to accurately identify specific medical conditions.

This use of few-shot learning improves diagnostics and improves our ability to detect rare diseases, when data is scarce.

Marketing: Marketing to Niche Groups and New Trends

In marketing, we can use shot-based learning to help us market to niche customer groups or to help us initiate new marketing campaigns for new trends.

More specifically, take a case where we are using market segmentation to target smaller customer subgroups. If a particular segment is very small, it might be difficult to build predictive models that help us market to that small segment, due to a lack of data. In such a situation, we can use shot-based learning to build predictive models or apply other AI techniques, even with limited data.

In the case of a new market trend, there may be very limited data due to the newness of the trend. Again, in such a case, we can use shot-based learning to help us make predictions about how to market to customers for that new trend, even in the face of limited data.

Customer Service: Adaptation to New Service Requests

Since the rise of large language models (LLMs) a few years ago, there has been a trend towards automating customer service with LLM-based chatbots or virtual assistants.

With chatbots or virtual assistants, there may be new or unique service requests from customers that were outside of the training data for the model.

In such a scenario, zero-shot and few-shot techniques can help these models adapt to new issues and customer questions without extensive retraining.

This enhances the responsiveness and flexibility of these automated customer support tools.

Industry: Anomaly and Defect Detection

In industrial settings, we can use shot-based learning to detect anomalies and defects, and to help with predictive maintenance.

Specifically, you can use few-shot techniques to help detect critical events like machine failures. Such a use case could facilitate early detection of problems, which could, in turn, decrease factory downtimes, increase safety, and improve the efficiency of operations.

Natural Language Processing: Translation and Sentiment Analysis

Shot-based learning has become increasingly useful in natural language processing (NLP) tasks like sentiment analysis, text classification, and translation.

In this NLP setting, few-shot, one-shot, and zero-shot techniques enable NLP systems to perform accurately on tasks with limited training data, such use cases on rare languages and dialects.

Finance: Fraud Detection

In finance, we can use shot-based techniques for tasks like fraud detection and risk analysis.

For example, some types of financial or transaction fraud might be extremely rare, which makes them difficult to detect with traditional analytical methods. In such a task with limited training data, we can use shot-based techniques to build more accurate models that generalize from a very small set of training examples.

Challenges and Considerations When Using Shot-Based Learning Techniques

Finally, let’s briefly discuss some of the challenges and considerations that we need to keep in mind when we use shot-based techniques.

The main areas that I’ll discuss here are:

  • Data Scarcity and Quality
  • Model Generalization
  • Evaluation and Testing

Let’s briefly look at these individually.

Data Scarcity and Quality

One of the biggest problems with shot-based learning gets to the nature of the problem itself: the scarcity of data.

By their nature, shot-based techniques operate on limited data, and as discussed previously, this is due to the fact that we use shot-based techniques in circumstances where data examples are rare, expensive, or difficult to acquire.

Therefore, at least in the case of one-shot or few-shot models, we need to make sure that we have the ability to acquire data examples of sufficient quality that will enable us to build these models.

Model Generalization

With shot-based techniques, we often also have significant challenges with model generalization, due to the small number of training examples.

Given only a few examples per class to train on (or even zero examples), the model might struggle to generalize. Said differently, even if you can train the model with a few examples, once you put the model into use, it might perform poorly. We typically refer to this issue as “overfitting,” a situation where the model performs well on training data, but performs much worse (i.e., fails to generalize) when presented with new data.

To help a model generalize better, we can use techniques such as meta-learning, data augmentation, and transfer learning. These techniques can help models learn and adapt better, even in tasks with limited data.

Evaluation and Testing

Model validation and testing inherently pose a special challenge in shot-based learning due to the limited data.

With traditional machine learning methods, we often have large datasets for model validation and testing. But due to the limited data examples, the traditional tools and techniques that we might use for validation and testing are essentially unavailable.

So with shot-based learning, we typically have much greater difficulty when we attempt to evaluate the performance of the model, and when we try to detect overfitting.

To mitigate this issue with evaluation and testing, we often need to use techniques like cross-validation or few-shot benchmarking.

Wrapping Up

In this article, I’ve tried to give you a high level overview of shot-based learning: what shot-based learning is; different types of shot-based learning (like few-shot, one-shot, and zero-shot); different applications of shot-based learning in fields like marketing, healthcare and finance; and some of the challenges of shot-based learning.

Ultimately, shot-based learning is a powerful tool in our AI development toolkit that enables us to build accurate models even in the face of scarce data, and in turn, helps us solve difficult problems when training examples are rare or expensive to obtain.

Joshua Ebner

Joshua Ebner is the founder, CEO, and Chief Data Scientist of Sharp Sight.   Prior to founding the company, Josh worked as a Data Scientist at Apple.   He has a degree in Physics from Cornell University.   For more daily data science advice, follow Josh on LinkedIn.

Leave a Comment