Offline RL Made Easier: No TD Learning, Advantage Reweighting, or Transformers
BAIR
APRIL 20, 2022
A demonstration of the RvS policy we learn with just supervised learning and a depth-two MLP. It uses no TD learning, advantage reweighting, or Transformers! Offline reinforcement learning (RL) is conventionally approached using value-based methods based on temporal difference (TD) learning.
Let's personalize your content