ALEXANDRIA, Va., Oct. 28 -- United States Patent no. 12,450,888, issued on Oct. 21, was assigned to Tsinghua University (Beijing).
"Hierarchical audio-visual feature fusing method for audio-visual question answering and product" was invented by Wenwu Zhu (Beijing), Xin Wang (Beijing) and Pinci Yang (Beijing).
According to the abstract* released by the U.S. Patent & Trademark Office: "A hierarchical audio-visual feature fusing method for audio-visual question answering and a product relate to the field of audio-visual question answering. By fusing audio embedding in an input video clip with a baseline model as well as video embedding and question embedding respectively at an early stage, a middle stage and a late stage in a hierarchical fe...