Neural networks made easy (Part 79): Feature Aggregated Queries (FAQ) in the context of state
by
, 08-22-2024 at 12:32 PM (210 Views)
more...Object detection in video has a number of certain characteristics and must solve the problem of changes in object features caused by motion, which are not encountered in the image domain. One of the solutions is to use temporal information and combine features from adjacent frames. The paper "FAQ: Feature Aggregated Queries for Transformer-based Video Object Detectors" proposes a new approach to detecting objects in video. The authors of the article improve the quality of queries for Transformer-based models by aggregating them. To achieve this goal, a practical method is proposed to generate and aggregate queries according to the features of the input frames. Extensive experimental results provided in the paper validate the effectiveness of the proposed method. The proposed approaches can be extended to a wide range of methods for detecting objects in images and videos to improve their efficiency.