ALEXANDRIA, Va., Dec. 31 -- United States Patent no. 12,511,876, issued on Dec. 30, was assigned to NEC Corp. (Tokyo).
"Single stream multi-level alignment for vision-language pretraining" was invented by Vijay Kumar Baikampady Gopalkrishna (Santa Clara, Calif.), Xiang Yu (Mountain View, Calif.) and Samuel Schulter (Long Island City, N.Y.).
According to the abstract* released by the U.S. Patent & Trademark Office: "A method is provided for pretraining vision and language models that includes receiving image-text pairs, each including an image and a text describing the image. The method encodes an image into a set of feature vectors corresponding to input image patches and a CLS token which represents a global image feature. The method par...