📋

CLIP: Contrastive Learning Image Pretraining (image-to-text)

CLIP architecture diagram showing image encoder and text encoder with contrastive learning.
CLIP training process: matching image-text pairs via contrastive objective.

Look for Contrastive loss in related notes.