Paper Reading Notes – Data Enhancement

First. Core idea: What is data augmentation?

In summary, data augmentation is the process of artificially creating more and diverse new data from existing training data through a series of technical means.
A simple analogy:
Your model is like a student, and the training data is its exercise book.

  • Scenario 1 (No Data Enhancement): The student only has a thin exercise book, and he prepares by repeatedly practicing these questions.. As soon as I entered the exam room, I found that all the question types had changed and I failed the exam directly. This is overfitting – he only knows how to memorize the original question by rote.
  • Scenario 2 (with data augmentation): We used the existing exercise book and generated ten thick new exercise books bychanging the question conditions, numbers, and expression methods. Students truly understand the principles behind the knowledge points by doing these ‘never deviate from its roots’ questions. In the exam room, no matter how the question types change, he can respond flexibly. This is how to improve the generalization ability of the model.

So, the fundamental purpose of data augmentation is to:

  1. Increase data volume to prevent overfitting of the model.
  2. Enhance data diversity, allowing models to see more possible scenarios and learn more robust and essential features..

Secondly, why do we need data augmentation? (Necessity)

  1. Data Hunger: Deep learning models are “data monsters” with parameters ranging from millions to tens of millions, requiring massive amounts of data to fully train and avoid overfitting. But the cost of collecting and annotating high-quality data is extremely high.
  2. Covering the ‘long tail’: The real world is complex and diverse, and there are always some rare scenes (such as profile, occlusion, special lighting). The original dataset is difficult to cover all situations, and data augmentation can simulate these edge situations to make the model more robust.
  3. Introducing Invariance: We hope the model can recognize a cat, whether it is upright, upside down, in a bright or dark place.. By enhancing, we can actively teach the model these invariants (rotation invariance, illumination invariance, etc.).

Third. Common Data Enhancement Techniques

Data augmentation techniques are mainly divided into two categories:basic augmentationandadvanced augmentation.

A. Basic enhancement (pixel level/spatial transformation)

This type of method directly operates on the pixels or geometric shapes of the image itself, which is simple and effective.

  1. Geometric transformations
    • Rotate/Flip: Rotate the image clockwise by 10 degrees, 20 degrees, etc; Flip horizontally or vertically. The direction of the model’s objectives is not a key feature.
    • Crop: Randomly extract a portion from the image. This forces the model to not rely on the absolute position of the target and focus on local features.
    • Scale/Stretch: Enlarge or shrink an image, or perform non proportional stretching.
    • Pan: Move the image up, down, left, and right within the canvas.
  2. Pixel transformation
    • Color jitter: Adjust the brightness, contrast, saturation, and hue of the image.. Make the model independent of specific color distributions.
    • Add noise: Add Gaussian noise, salt and pepper noise, etc. to the image.. Make the model less sensitive to image quality and more resistant to interference.
    • Blur/Sharpen: Use filters such as Gaussian blur. Simulate situations where the image is out of focus or captured from a distance.
    • Erase: Randomly set a small rectangular area in the image to 0 or a random value. This is a very effective method that forces the model not to rely solely on one obvious feature (such as recognizing cats solely based on their faces), but to learn multiple features.
B. Advanced Enhancement (Hybrid and Intelligent Enhancement)

This type of method is more “intelligent” and usually mixes multiple images or their features to generate more complex and challenging samples.

  1. MixUp
    • Method: Take two imagesx1andx2, along with their corresponding labelsy1andy2, and then mix them linearly in a ratioλ..
      • New image:x_new=λ * x1+(1- λ) * x2
      • New tag:y_new=λ * y1+(1- λ) * y2
    • Idea: Teach the model to learn “fuzzy” decision boundaries, make them smoother, and improve generalization ability.. The labels are also mixed, for example, 60% of the new image is “cat” and 40% is “dog”.
  2. CutMix
    • Method: Randomly crop an area from image A, and then fill it with the corresponding area in image B to generate a new image.. The labels of the new image will also be mixed according to the proportion of the cropped area.
      • New image:x_new=M * x_A+(1-M) * x_B(Mis a binary mask representing the cropped area)
      • New tag:y_new=λ * y_A+(1- λ) * y_B
    • Advantages: Compared to MixUp, the generated new images are morenatural(because the pasted image is a complete block, not a pixel mixture), while retaining the information of region localization.. It increases diversity without completely losing important feature information.
  3. Model/Adversarial Enhancement
    • Neural style transfer: Retain the content of image A, but apply the style of image B to generate a new training image.
    • Generative Adversarial Network (GAN): Directly use GAN to generate new, realistic training images.. This is the ‘ultimate’ data augmentation, but the technology is complex and may introduce artifacts.
    • AutoAugment/RandAugment: Use reinforcement learning or search algorithms to automatically find the most effective combination of augmentation strategies for a specific dataset.