Something to read

**mql5** · 03-01-2024, 12:30 PM

The article considers the theoretical application of quantization in the construction of tree models No complex mathematical equations are used. While writing the article, I discovered the absence of established unified terminology in the scientific works of different authors, so I will choose the terminology options that, in my opinion, best reflect the meaning. Besides, I will use the terms of my own in the matters left unattended by other researchers. This article will use terms and concepts I have previously described in the article "CatBoost machine learning algorithm from Yandex without learning Python or R". Therefore, I recommend that you familiarize yourself with it before reading the current article.

more...

**mql5** · 03-12-2024, 03:10 PM

Previously, we considered hierarchical models for solving problems with, so to speak, the classical approach of the Markov process. However, the advantages of using hierarchical approaches also apply to sequence analysis problems. One such algorithm is the Control Transformer presented in the article "Control Transformer: Robot Navigation in Unknown Environments through PRM-Guided Return-Conditioned Sequence Modeling". The method authors position it as a new architecture designed to solve complex control and navigation problems based on reinforcement learning. This method combines modern methods of reinforcement learning, planning and machine learning, which allows us to create adaptive control strategies in a variety of environments.

Control Transformer opens new perspectives for solving complex control problems in robotics, autonomous driving and other fields. I propose to look at the prospects for using this method in solving our trading problems.

more...

**mql5** · 03-14-2024, 03:50 PM

The article considers the practical application of quantization in the construction of tree models No complex mathematical equations are used. This is the second part of the article "Quantization and other methods of preprocessing input data in machine learning", so I strongly recommend starting your acquaintance with it. Here we will talk about the following:

In the first part, we will consider the methods for preprocessing sample data implemented in MQL5.
In the second part, we will conduct an experiment that will provide information on the feasibility of data quantization.

more...

**mql5** · 03-15-2024, 03:50 PM

PDT jointly learns an embedding space of future trajectory as well as a future prior conditioned only on past information.. By conditioning action prediction on the target future embedding, PDT is endowed with the ability to "reason over the future". This ability is naturally task-independent and can be generalized to different task specifications.

To achieve efficient online fine-tuning in downstream tasks, you can easily adapt the framework to new conditions by associating each future embedding to its return, which is realized by training a reward prediction network for each future embedding.

more...

**mql5** · 03-19-2024, 04:50 PM

The Decision Transformer and all its modifications, which we discussed in recent articles, belong to the methods of Behavior Cloning (BC). We train models to repeat actions from "expert" trajectories depending on the state of the environment and the target outcomes. Thus, we teach the model to imitate the behavior of an expert in the current state of the environment in order to achieve the target.

more...

**mql5** · 03-26-2024, 07:20 PM

Behavior cloning methods, largely based on the principles of supervised learning, show fairly good results. But their main problem remains the search for ideal role models, which are sometimes very difficult to collect. In turn, reinforcement learning methods are able to work with non-optimal raw data. At the same time, they can find suboptimal policies to achieve the goal. However, when searching for an optimal policy, we often encounter an optimization problem that is more relevant in high-dimensional and stochastic environments.

To bridge the gap between these two approaches, a group of scientists proposed the Distance Weighted Supervised Learning (DWSL) method and presented it in the article "Distance Weighted Supervised Learning for Offline Interaction Data". It is an offline supervised learning algorithm for goal-conditioned policy. Theoretically, DWSL converges to an optimal policy with a minimum return boundary at the level of trajectories from the training set. The practical examples in the article demonstrate the superiority of the proposed method over imitation learning and reinforcement learning algorithms. I suggest taking a closer look at this DWSL algorithm. We will evaluate its strengths and weaknesses in solving our practical problems.

more...

**mql5** · 04-27-2024, 02:20 AM

Reinforcement learning is a universal platform for learning optimal behavior policies in the environment under exploration. Policy optimality is achieved by maximizing the rewards received from the environment during interaction with it. But herein lies one of the main problems of this approach. The creation of an appropriate reward function often requires significant human effort. Additionally, rewards may be sparse and/or insufficient to express the true learning goal. As one of the options for solving this problem, the authors if the paper "Beyond Reward: Offline Preference-guided Policy Optimization" suggested the OPPO method (OPPO stands for the Offline Preference-guided Policy Optimization). The authors of the method suggest the replacement of the reward given by the environment with the preferences of the human annotator between two trajectories completed in the environment under exploration. Let's take a closer look at the proposed algorithm.

more...

**mql5** · 05-22-2024, 07:40 AM

Offline reinforcement learning allows the training of models based on data collected from interactions with the environment. This allows a significant reduction of the process of interacting with the environment. Moreover, given the complexity of environmental modeling, we can collect real-time data from multiple research agents and then train the model using this data.

At the same time, using a static training dataset significantly reduces the environment information available to us. Due to the limited resources, we cannot preserve the entire diversity of the environment in the training dataset.

more...

**mql5** · 05-27-2024, 08:51 PM

The approach to optimizing the Agent policy with constraints on its behavior turned out to be promising in solving offline reinforcement learning problems. By exploiting historical transitions, the Agent policy is trained to maximize a learned value function.

Behavior constrained policy can help to avoid a significant distribution shift in relation to Agent actions, which provides sufficient confidence in the assessment of the action costs. In the previous article we got acquainted with the SPOT method, which exploits this approach. As a continuation of the topic, I propose to get acquainted with the Closed-Form Policy Improvement (CFPI) algorithm, which was presented in the paper "Offline Reinforcement Learning with Closed-Form Policy Improvement Operators".

more...

**mql5** · 05-29-2024, 09:24 PM

Goal-Conditioned Behavior Cloning (BC) is a promising approach for solving various offline reinforcement learning problems. Instead of assessing the value of states and actions, BC directly trains the Agent behavior policy, building dependencies between the set goal, the analyzed environment state and the Agent's action. This is achieved using supervised learning methods on pre-collected offline trajectories. The familiar Decision Transformer method and its derivative algorithms have demonstrated the effectiveness of sequence modeling for offline reinforcement learning.

more...

Something to read

LinkBack

Thread Tools

Search Thread

Display

Quantization in machine learning (Part 1): Theory, sample code, analysis of implementation in CatBoost

Neural networks made easy (Part 62): Using Decision Transformer in hierarchical models

Quantization in machine learning (Part 2): Data preprocessing, table selection, training CatBoost models

Neural networks made easy (Part 63): Unsupervised Pretraining for Decision Transformer (PDT)

Neural networks made easy (Part 64): ConserWeightive Behavioral Cloning (CWBC) method

Neural networks made easy (Part 65): Distance Weighted Supervised Learning (DWSL)

Neural networks made easy (Part 68): Offline Preference-guided Policy Optimization

Neural networks made easy (Part 69): Density-based support constraint for the behavioral policy (SPOT)

Neural networks made easy (Part 70): Closed-Form Policy Improvement Operators (CFPI)

Neural networks made easy (Part 71): Goal-Conditioned Predictive Coding GCPC)

LinkBacks (?)

Untitled document

10 Weekend Reads - The Big Picture

Do Payrolls Have A Measurement Problem Leading To Strong Numbers? - The Big Picture

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions