Segment images into objects, instances, or scenes
Predict depth map from a single image
Annotate and describe images with text prompts
a tiny vision language model