Automatic Environment Shaping
is the Next Frontier in RL

ICML 2024 Position Track Submission

Flowchart of a typical behavior generation pipeline, illustrating three distinct subtasks of sample environment generation, environment shaping, and outer feedback loop with behavior reflection. We highlight the manual, heuristic-driven environment shaping as a key, yet often overlooked, bottleneck in generalizing the success of RL to a wider range of scenarios, and advocate for automating the process to broaden RL's applicability; democratizing RL for Robotics.


Reinforcement learning algorithms for robotics are frequently benchmarked on pre-constructed simulation environments. Each environment includes heuristic task-specific design choices, including shaping the observation space, action space, terminal conditions, initial states, goal distribution, and reward. Because these choices are not automated, they remain an under-acknowledged bottleneck for applying techniques purported to be general-purpose to new robots and tasks. We propose a terminology for this problem, review relevant progress to date, and propose a new benchmark to evaluate automated environment shaping.