• Competition End time: 2022-10-17
  • Submission Format: notebook <= 9h

Key Features

  1. No training data provided.
  2. Ensemble cannot easily work. thread


mean Precision @ 5 metric + a small modification to avoid penalizing queries with fewer than 5 expected index images.

\[mP@5 = \frac{1}{Q} \sum_{q=1}^Q \frac{1}{\min(n_q, 5)} \sum_{j=1}^{\min(n_q, 5)} rel_q(j)\]
  • embedding dimension should be <= 64
  • compatible with TensorFlow 2.6.4 or Pytorch 1.11.0

The host will use k-NN (k=5) to lookup for each test sample, using the Euclidean distance between test and index embeddings.


No training data provided. Here is the distribution of test data.

  1. External data thread: https://www.kaggle.com/competitions/google-universal-image-embedding/discussion/337384

Great Notebooks

  1. CLIP-TF-Train-Example
    1. CLIP + Arcface + TPU training.
  2. GCVIT
    1. Global Context Vision Transformer
  3. Understand Comp Domain and ImageNet 21k Labels
    1. Understanding comp domain and labels


  1. 1st place solution
    1. Using pre-trained weights w/o training or fine-tuning first.
    2. CLIP Github
    3. ArcFace
    4. Add datasets to training list iteratively to save time and maintain good performance
    5. unfreeze the backbone after linear head is well trained, so we don’t need to worry about the random linear head would affect the backbone weights
      1. use 10 times lower initial learning rate.
      2. Easy to get overfit and linear projection weights jumped sharply.
      3. So freeze linear head to train and add dropout to fully connection layer.
    6. Clever ensemble to overcome different F(C, X) issues
      1. resolution 224 + resolution 280
    7. LAION-5B CLIP Model blog
  2. 2nd place solution
    1. dynamic margin
    2. stratified learning rate when training non-backbone part.
  3. 4th place solution
  4. 5th place solution
  5. The rest thread