Command Palette

Search for a command to run...

John Schulman

Co-invented Proximal Policy Optimization and leads OpenAI's reinforcement learning work on scalable alignment techniques.

John Schulman is a co-founder of OpenAI and a pioneering AI researcher best known for creating the Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) algorithms. He led the reinforcement-learning and post-training efforts that produced ChatGPT and, as of 2024, serves as Chief Scientist at the next-generation AI company Thinking Machines, where he continues to push the boundaries of scalable alignment techniques.1

  Pioneering Modern Reinforcement Learning Algorithms

Schulman's work on TRPO and PPO transformed deep reinforcement learning by making policy optimization more stable and sample-efficient. These methods have become standard baselines in both academia and industry, powering advances in robotics, gaming, and large-scale simulation environments.2

  Advancing RLHF and ChatGPT Development

At OpenAI, Schulman spearheaded the application of reinforcement learning from human feedback (RLHF), a technique that aligns large language models with human preferences. This approach underpinned the breakthrough performance and usability of ChatGPT, marking a major milestone in conversational AI and setting new norms for model alignment.2

  Championing Safe and Alignable AI

Beyond algorithmic innovation, Schulman has consistently advocated for rigorous safety research. His recent roles at Anthropic and Thinking Machines focus on scalable oversight and alignment strategies, aiming to ensure that increasingly capable AI systems act in accordance with human values and benefit society at large.13

  References

  1. joschu.net 2

  2. news.berkeley.edu 2

  3. innovatorsunder35.com