![]() ![]() We demonstrate that the simple pre-training task of predicting which caption goes Learning directly from raw text about images is a promising alternative which leverages a Restricted form of supervision limits their generality and usability since additional labeled data is needed to specifyĪny other visual concept. State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. The abstract from the paper is the following: Instructed in natural language to predict the most relevant text snippet, given an image, without directly optimizingįor the task, similarly to the zero-shot capabilities of GPT-2 and 3. (Contrastive Language-Image Pre-Training) is a neural network trained on a variety of (image, text) pairs. Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |