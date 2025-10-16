Agent Learning via Early Experience​

A paradigm we call early experience: interaction data generated by the agent's own actions, where the resulting future states serve as supervision without reward signals. ​

​

Within this paradigm we study two strategies of using such data: ​

(1) Implicit world modeling, which uses collected states to ground the policy in environment dynamics; and ​

(2) Self-reflection, where the agent learns from its suboptimal actions to improve reasoning and decision-making. ​

​

We evaluate across eight diverse environments and multiple model families. ​

​

Our approaches consistently improve effectiveness and out-of-domain generalization, highlighting the value of early experience. ​

Moreover, in environments with verifiable rewards, our results provide promising signals that early experience offers a strong foundation for subsequent reinforcement learning, positioning it as a practical bridge between imitation learning and fully experience-driven agents.​