The proposed Object Cluster Module plays a crucial role in enabling the model to achieve a deeper, more holistic understanding of both linguistic and visual information. more...