Neural networks made easy (Part 37): Sparse Attention
by
, 02-26-2024 at 05:19 AM (279 Views)
more...In the previous article, we discussed relational models which use attention mechanisms in their architecture. We used this model to create an Expert Advisor, and the resulting EA showed good results. However, we noticed that the model's learning rate was lower compared to our earlier experiments. This is due to the fact that the transformer block used in the model is a rather complex architectural solution performing a large number of operations. The number of these operations grows in a quadratic progression as the size of the analyzed sequence increases, leading to an increase in memory consumption and model training time.
However, we recognize the limited resources available to improve the model. Therefore, there is a need to optimize the model with minimal loss of quality.