ALEXANDRIA, Va., June 12 -- United States Patent no. 12,299,961, issued on May 13, was assigned to Salesforce Inc. (San Francisco).
"Systems and methods for unified vision-language understanding and generation" was invented by Junnan Li (Singapore) and Chu Hong Hoi (Singapore).
According to the abstract* released by the U.S. Patent & Trademark Office: "Embodiments described herein provide systems, methods, and devices for pre-training a multimodal encoder-decoder (MED) model for vision-language tasks. A method may include encoding, by an image encoder of the MED, an image into an image representation; encoding, by a text encoder of the MED, a text into a text representation; generating, by an image-grounded text encoder of the MED, a mult...