SAM3 is probably a great model to distill from when training smaller segmentation models, but isn't their DINOv2 a better example of a large foundation model to distill from for various computer vision tasks? I've seen it used for as starting point for models doing segmentation and depth estimation. Maybe there's a v3 coming soon?
Could you expand on that? Do you mean you're starting with the pretrained DINO model and then using SAM3 to generate training data to make DINO into a segmentation model? Do you freeze the DINO weights and add a small adapter at the end to turn its output into segmentations?
https://dinov2.metademolab.com/