John Schulman is a co-founder of OpenAI and a pioneering AI researcher best known for creating the Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) algorithms. He led the reinforcement-learning and post-training efforts that produced ChatGPT and, as of 2024, serves as Chief Scientist at the next-generation AI company Thinking Machines, where he continues to push the boundaries of scalable alignment techniques.1
Pioneering Modern Reinforcement Learning Algorithms
Schulman's work on TRPO and PPO transformed deep reinforcement learning by making policy optimization more stable and sample-efficient. These methods have become standard baselines in both academia and industry, powering advances in robotics, gaming, and large-scale simulation environments.2
Advancing RLHF and ChatGPT Development
At OpenAI, Schulman spearheaded the application of reinforcement learning from human feedback (RLHF), a technique that aligns large language models with human preferences. This approach underpinned the breakthrough performance and usability of ChatGPT, marking a major milestone in conversational AI and setting new norms for model alignment.2
Championing Safe and Alignable AI
Beyond algorithmic innovation, Schulman has consistently advocated for rigorous safety research. His recent roles at Anthropic and Thinking Machines focus on scalable oversight and alignment strategies, aiming to ensure that increasingly capable AI systems act in accordance with human values and benefit society at large.13