Buy Now

Canada icon icon

Europe icon icon icon icon

Asia-Pacific icon icon icon

Australia icon icon

How This book is different

Contextual Application of Techniques 👩🏻‍💻

We emphasize that there's no one-size-fits-all solution to address imbalanced data. The book guides you through establishing a baseline and contextualizes techniques according to domain, data distribution, performance metrics, and business objectives. It's an essential toolkit for any scenario involving imbalanced data, ensuring you're prepared even for challenges you haven't yet encountered.

Fun Learning with Comics and Images 🥳

Unique to our approach is the inclusion of comics throughout the book. These creative additions not only make complex concepts more digestible but also add an element of fun to your learning experience. They serve as visual metaphors, drawing parallels between technical ideas and everyday situations, enhancing understanding and retention.

Practical Insights from Industry 🚀

Throughout the book, 'In production' tip boxes provide insights into how big companies like OpenAI have dealt with imbalanced data, offering a real-world perspective that complements the theoretical knowledge.

Key Feature

image

icon The book is packed with detailed explanations, illustrations, and code samples using modern machine learning frameworks.

icon Learn cutting-edge deep learning techniques to overcome data imbalance.

icon Explore different methods for dealing with skewed data in ML and DL applications.

Book Description

icon Understand the prevalence and challenges of class imbalances in machine learning datasets.

icon Explore essential techniques like sampling and cost-sensitive learning to address imbalances.

icon Delve into sophisticated methods using PyTorch to enhance deep learning models.

icon Gain hands-on experience with real-world applications through fully functional code examples.

icon Learn to identify imbalances and implement corrective strategies effectively across various machine learning models.

image
image
image

What you will learn?

1

Use imbalanced data in your machine learning models effectively

2

Explore the metrics used when classes are imbalanced

3

Understand how and when to apply various sampling methods such as over-sampling and under-sampling

4

Apply data-based, algorithm-based, and hybrid approaches for dealing with class imbalance

5

Combine and choose from various options for data balancing while avoiding the common pitfalls

6

Understand the concepts of model calibration and threshold adjustment in the context of dealing with imbalanced datasets

Target audience

This comprehensive guide is designed for a wide range of professionals, including ML researchers, data scientists, software engineers, and practical insight seekers. Whether you're a student, an experienced data expert, or someone looking to integrate ML solutions into software applications, this book has valuable insights for you.

Table of Contents

  1. Introduction to Data Imbalance in Machine Learning icon
  2. Oversampling Methods
  3. Undersampling Methods
  4. Ensembling Methods
  5. Cost-Sensitive Learning
  6. Data Imbalance in Deep Learning
  7. Data-level Deep Learning Methods
  8. Algorithm-level Deep Learning Methods
  9. Hybrid Deep Learning Methods
  10. Model Calibration
  1. Introduction to Data Imbalance in Machine Learning
  2. In this chapter, we will discuss and define imbalanced datasets, explaining how they differ from other types of datasets. The ubiquity of imbalanced data will be demonstrated with examples of common problems and scenarios. We will also go through the basics of machine learning and cover the essentials, such as loss functions, regularization, and feature engineering. We will also learn about common evaluation metrics, particularly those that can be very helpful for imbalanced datasets. We will then introduce the imbalanced-learn library.

  1. Introduction to Data Imbalance in Machine Learning
  2. Oversampling Methodsicon
  3. Undersampling Methods
  4. Ensembling Methods
  5. Cost-Sensitive Learning
  6. Data Imbalance in Deep Learning
  7. Data-level Deep Learning Methods
  8. Algorithm-level Deep Learning Methods
  9. Hybrid Deep Learning Methods
  10. Model Calibration
  1. Oversampling Methods
  2. In this chapter, we will introduce the concept of oversampling, discuss when to use it, and the various techniques to perform it. We will also demonstrate how to utilize these techniques through the imbalanced-learn library APIs and compare their performance using some classical machine learning models. Finally, we will conclude with some practical advice on which techniques tend to work best under specific real-world conditions.

  1. Introduction to Data Imbalance in Machine Learning
  2. Oversampling Methods
  3. Undersampling Methods icon
  4. Ensembling Methods
  5. Cost-Sensitive Learning
  6. Data Imbalance in Deep Learning
  7. Data-level Deep Learning Methods
  8. Algorithm-level Deep Learning Methods
  9. Hybrid Deep Learning Methods
  10. Model Calibration
  1. Undersampling Methods
  2. In this chapter, you will learn about the concept of undersampling, including when to use it and the various techniques to perform it. You will also see how to use these techniques via the imbalancedlearn library APIs and compare their performance with some classical machine learning models.

  1. Introduction to Data Imbalance in Machine Learning
  2. Oversampling Methods
  3. Undersampling Methods
  4. Ensembling Methods icon
  5. Cost-Sensitive Learning
  6. Data Imbalance in Deep Learning
  7. Data-level Deep Learning Methods
  8. Algorithm-level Deep Learning Methods
  9. Hybrid Deep Learning Methods
  10. Model Calibration
  1. Ensembling Methods
  2. The problem with traditional ensemble methods is that they use classifiers that assume balanced data. Thus, they may not work very well with imbalanced datasets. So, we combine the popular machine learning ensembling methods with the techniques for dealing with imbalanced data that we studied in previous chapters. We are going to discuss those combinations in this chapter.

  1. Introduction to Data Imbalance in Machine Learning
  2. Oversampling Methods
  3. Undersampling Methods
  4. Ensembling Methods
  5. Cost-Sensitive Learning icon
  6. Data Imbalance in Deep Learning
  7. Data-level Deep Learning Methods
  8. Algorithm-level Deep Learning Methods
  9. Hybrid Deep Learning Methods
  10. Model Calibration
  1. Cost-Sensitive Learning
  2. Cost-sensitive learning is an e!ective strategy to tackle imbalanced data. We will go through this technique and learn why it can be useful. This will help us understand some of the details of cost functions and how machine learning models are not designed to deal with imbalanced datasets by default. While machine learning models aren’t equipped to handle imbalanced datasets, we will see how modern libraries enable this..

  1. Introduction to Data Imbalance in Machine Learning
  2. Oversampling Methods
  3. Undersampling Methods
  4. Ensembling Methods
  5. Cost-Sensitive Learning
  6. Data Imbalance in Deep Learning icon
  7. Data-level Deep Learning Methods
  8. Algorithm-level Deep Learning Methods
  9. Hybrid Deep Learning Methods
  10. Model Calibration
  1. Data Imbalance in Deep Learning
  2. Class imbalanced data is a common issue for deep learning models. When one or more classes have significantly fewer samples, the performance of deep learning models can su!er as they tend to prioritize learning from the majority class, resulting in poor generalization for the minority class(es). We will cover the following topics in this chapter: A brief introduction to deep learning, Data imbalance in deep learning, Overview of deep learning techniques to handle data imbalance, and Multi-label classification

  1. Introduction to Data Imbalance in Machine Learning
  2. Oversampling Methods
  3. Undersampling Methods
  4. Ensembling Methods
  5. Cost-Sensitive Learning
  6. Data Imbalance in Deep Learning
  7. Data-level Deep Learning Methods icon
  8. Algorithm-level Deep Learning Methods
  9. Hybrid Deep Learning Methods
  10. Model Calibration
  1. Data-level Deep Learning Methods
  2. In this chapter, we’ll explore how to apply familiar sampling methods to deep learning models. Deep learning o!ers unique opportunities to enhance these methods further. We’ll delve into elegant techniques to combine deep learning with oversampling and undersampling. Additionally, we’ll learn how to implement various sampling methods with a basic neural network. We’ll also cover dynamic sampling, which involves adjusting the data sample across multiple training iterations, using varying balancing ratios for each iteration. Then, we will learn to use some data augmentation techniques for both images and text. We’ll end the chapter by highlighting key takeaways from a variety of other data-level techniques.

  1. Introduction to Data Imbalance in Machine Learning
  2. Oversampling Methods
  3. Undersampling Methods
  4. Ensembling Methods
  5. Cost-Sensitive Learning
  6. Data Imbalance in Deep Learning
  7. Data-level Deep Learning Methods
  8. Algorithm-level Deep Learning Methods icon
  9. Hybrid Deep Learning Methods
  10. Model Calibration
  1. Algorithm-level Deep Learning Methods
  2. This chapter will be on the same lines as Chapter 5, Cost-Sensitive Learning, extending the ideas to deep learning models. We will look at algorithm-level deep learning techniques to handle the imbalance in data. Generally, these techniques do not modify the training data and o#en require no pre-processing steps, o!ering the bene"t of no increased training times or additional runtime hardware costs.

  1. Introduction to Data Imbalance in Machine Learning
  2. Oversampling Methods
  3. Undersampling Methods
  4. Ensembling Methods
  5. Cost-Sensitive Learning
  6. Data Imbalance in Deep Learning
  7. Data-level Deep Learning Methods
  8. Algorithm-level Deep Learning Methods
  9. Hybrid Deep Learning Methods icon
  10. Model Calibration
  1. Hybrid Deep Learning Methods
  2. In this chapter, we will talk about some of the hybrid deep learning techniques that combine the data-level and algorithm-level methods in some ways. $is chapter contains some recent and more advanced techniques that can be challenging to implement, so it is recommended to have a good understanding of the previous chapters. We will continue to explore strategies to tackle class imbalance in deep learning, examining techniques that manipulate data distribution and prioritize challenging examples. We will also go over techniques called hard example mining and minority class incremental recti"cation, which focus on improving model performance through prioritization of di%cult instances and iterative enhancement of minority class representation, respectively.

  1. Introduction to Data Imbalance in Machine Learning
  2. Oversampling Methods
  3. Undersampling Methods
  4. Ensembling Methods
  5. Cost-Sensitive Learning
  6. Data Imbalance in Deep Learning
  7. Data-level Deep Learning Methods
  8. Algorithm-level Deep Learning Methods
  9. Hybrid Deep Learning Methods
  10. Model Calibration icon
  1. Model Calibration
  2. In this chapter, we will see the need to do some post-processing of the prediction scores that we get from the trained models. This can be helpful either during the real-time prediction from the model or during the o&ine training time evaluation of the model. We will also understand some ways of measuring how calibrated the model is and how imbalanced datasets make the model calibration inevitable.

About the authors

image

Kumar Abhishek

Kumar Abhishek seasoned Senior Machine Learning Engineer at Expedia Group, US, specializing in risk analysis and fraud detection.With over a decade of machine learning and software engineering experience, Kumar has worked for companies such as Microsoft, Amazon, and a Bay Area startup. Kumar holds a master's in Computer Science from the University of Florida, Gainesville.You can find him as Kumar Abhishek on LinkedIn.

image

Dr. Mounir Abdelaziz

Dr. Mounir Abdelaziz is a deep learning researcher specializing in computer vision applications. He holds a Ph.D. in computer science and technology from Central South University, China. During his Ph.D. journey, he developed innovative algorithms to address practical computer vision challenges. He has also authored numerous research articles in the field of few-shot learning for image classification You can find him as Dr. Mounir Abdelaziz on LinkedIn

shape
shape
shape
shape
shape

Now Available

Machine Learning for Imbalanced Data is available in print and ebook formats from your favorite bookstore.