Noam Brown is a research scientist at OpenAI whose work focuses on multi-agent reinforcement learning and strategic reasoning. He led the creation of the first AIs to defeat top humans in both two-player and multiplayer no-limit poker (Libratus and Pluribus) and later helped build CICERO, the first agent to reach human-level performance in the strategy game Diplomacy 123. His research demonstrates how self-play and game-theoretic planning can scale to real-world negotiation and decision-making problems. A Carnegie Mellon PhD, Brown was named one of MIT Technology Review's 35 Innovators Under 35 for his contributions to AI 4.
Superhuman Poker Bots: Libratus and Pluribus
Libratus famously defeated four elite professionals over 120,000 hands of heads-up no-limit Texas Hold'em in 2017, while its successor Pluribus bested six-max tables of top pros in 2019 2. These breakthroughs overturned assumptions about imperfect-information games and introduced techniques such as end-game solving and recursive subgame analysis that have since influenced professional play.
Human-Level Diplomacy With CICERO Agent
CICERO combined self-play reinforcement learning with large-language-model dialogue generation to negotiate and cooperate in the game Diplomacy, finishing in the top decile of human players on the popular web server 3. The work shows that strategic reasoning and natural-language interaction can be unified, pointing toward AI systems that collaborate with people in complex settings.
Advancing Scalable Reasoning at OpenAI
At OpenAI, Brown heads efforts to imbue language models with deliberate, multi-step reasoning. He co-designed the o1 series—models that internally "think" via private chain-of-thought tokens and improve as they are allowed more inference compute 5. The public o1-preview and o1-mini releases demonstrated state-of-the-art performance on math, science, and competitive programming benchmarks while offering an optional chain-of-thought viewer for transparency. Brown's team views o1 as a proof of a new scaling dimension where longer reasoning time, not just more parameters, drives capability gains.