News
In this work, we explore neat yet effective Transformer-based frameworks for visual grounding. The previous methods generally address the core problem of visual grounding, i.e., multi-modal fusion and ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results