Question Answering

A Multi-Stage Vision-Language Framework for Knowledge-based Visual Question Answering

Devised a two-stage VLM-based pipeline by utilizing knowledge of LMs to first sample multiple candidate answers and then using CLIP to select the most likely choice.

Dec 5, 2023