Table of Contents
- Introduction
- From Lee Sedol to Einstein: The Origin and Meaning of “Move 37”
- Early Origins of AlphaGo
- The Significance of Move 37 in Game 2
- Reinforcement Learning as the Catalyst
- Emergent Phenomena in AI
- What Does Emergence Mean in Machine Learning?
- Why Expert Imitation Alone Falls Short
- The Evolutionary Angle
- Beyond the Board: Move 37 in Large Language Models
- Shifting from Go to Open-Domain Reasoning
- Cognitive Strategies: The New Frontier
- First Glimmers: “OpenAI-o1,” “DeepSeek-R1,” “Gemini 2.0 Flash Thinking”
- Reinforcement Learning and Large Language Models
- Reinforcement Learning Meets LLMs
- Move 37-Style Discoveries in Complex Problem Spaces
- The Potential for AI to Develop “Secret Languages”
- Real-World Examples and Early Signals
- Cross-Disciplinary Analogies: Einstein’s Leap
- Chain-of-Thought Prompting and Emergent Reasoning
- Ongoing Research Directions
- Challenges and Responsibilities
- Efficiency, Safety, and Interpretability
- Societal Impact: From Education to Industry
- Policy and Regulation Perspectives
- Looking Ahead: Toward the Unseen Move 37
- The Race to More Efficient Reinforcement Learning
- Clues for the Future
- Conclusion
1. Introduction
In March 2016, the Go community—and indeed, the broader world of artificial intelligence—witnessed a pivotal event: the second game of the five-game match between Google DeepMind’s AlphaGo and legendary Go champion Lee Sedol. The match as a whole was a historical milestone, but it was Move 37 in Game 2 that became symbolic. It wasn’t just a clever move; it was unexpectedly creative and fundamentally changed how we perceived AI’s potential for innovation. Professionals in Go analysis circles pegged Move 37 as having a 1 in 10,000 probability of ever being played by a human master. Yet AlphaGo’s neural network discovered this move through a process known as reinforcement learning, where it learned from countless self-play simulations, refining its strategy with each iteration.

This single move exposed a profound truth: an AI trained not just by imitating humans but by exploring the game state on its own can discover something entirely new, something even top experts might never dream of. For many observers, Move 37 shifted the goalposts of what is achievable in AI. It wasn’t just about superhuman performance; it was about a system coming up with actions that humans might call creative, surprising, or even brilliant.
While Go is a closed, rules-based environment, the next wave of AI models—particularly Large Language Models (LLMs)—is operating in a far more sprawling domain: human language, mathematics, coding, problem-solving, and beyond. People like Andrej Karpathy, whose post coined “Move 37” as the “word-of-the-day,” have suggested that as LLMs evolve, they, too, may exhibit emergent “cognitive strategies”. These strategies might parallel that groundbreaking moment when AlphaGo played Move 37. But they won’t be limited to the strict confines of a board game; they could manifest as creative leaps in math, code generation, or even the invention of entirely new ways of thinking.
In this article, we’ll embark on a deep exploration of Move 37: what it means, why it is so significant, and how it might reshape the future of AI. We’ll examine the process of reinforcement learning in depth, consider the idea of emergent behaviors in machine learning, and speculate on how LLMs might give rise to their own Move 37 in open-ended tasks. We’ll also consider how these developments may lead to AI systems inventing “secret languages,” forging new paths of creativity that surpass anything previously imagined. Throughout, we’ll weave in insights from up-to-date articles, research papers, and expert analyses, keeping our eyes on the horizon for that next quantum leap—an unseen Move 37 that might only become clear once it has taken us by surprise.
2. From Lee Sedol to Einstein: The Origin and Meaning of “Move 37”
2.1 Early Origins of AlphaGo
DeepMind’s AlphaGo was groundbreaking for a variety of reasons, but its real hallmark was how it blended deep neural networks with tree search strategies, effectively learning the complex game of Go from scratch. Prior computer programs for Go often relied on handcrafted heuristics or exhaustive search methods that were inadequate for the game’s staggering complexity. Go’s search space is enormous—far larger than that of chess—and as a result, the typical brute-force approach used by chess engines (like Deep Blue) was insufficient.
In a crucial paper published in Nature titled “Mastering the Game of Go with Deep Neural Networks and Tree Search”, David Silver and his colleagues laid out how AlphaGo used a policy network and a value network to guide its moves. By playing millions of self-play games, AlphaGo’s policy network learned to predict the probability of winning from any board state, while its value network evaluated the positions. This approach allowed the AI to prune vast areas of the Go tree and focus only on the most promising moves.

2.2 The Significance of Move 37 in Game 2
The pinnacle of AlphaGo’s match against Lee Sedol came in Game 2 when AlphaGo placed a stone on the 37th move that flabbergasted the commentators. One commentator even referred to it as a “divine move,” while others pegged it as a “mistake” initially. In post-game analyses, experts calculated the chance of a human choosing that same move at 1 in 10,000. However, it quickly became clear that Move 37 shifted the entire game’s trajectory in AlphaGo’s favor.
This was a perfect illustration of reinforcement learning discovering strategies outside the scope of typical human play. Rather than imitating professional games or leaning heavily on a human-curated opening, AlphaGo’s self-play exploration allowed it to stumble upon sequences no human teacher would have suggested.
2.3 Reinforcement Learning as the Catalyst
One of the essential lessons from Move 37 is that reinforcement learning (RL) offers a path toward genuine discovery and innovation. Unlike supervised learning, which depends on labeled examples or mimicry of human data, RL allows an AI to receive rewards or penalties for actions in an environment. Over thousands or millions of episodes, the AI refines its policy to maximize reward, often unearthing strategies that humans have never seen.
Even more fascinating is that the best moves and strategies often arise unpredictably. This unpredictability is not necessarily a flaw; it might be the most powerful strength of RL. When an AI can freely explore an environment—especially a richly complex one—it can uncover “secretly brilliant” lines of play or “cognitive maneuvers” that go beyond the sum of human knowledge. This phenomenon sets the stage for the next frontier: applying these principles to open-domain tasks, far beyond the confines of a 19×19 board.
So, what does Move 37 really mean beyond its literal reference to a Go board position? It has become shorthand for that magical, momentous, and just slightly unnerving event when an AI does something we don’t expect, can’t easily interpret in real time, yet discover in retrospect to be a stroke of genius. Drawing a parallel, some have likened it to Einstein’s eureka moments—where entire paradigms of physics shifted because of an insight that no one else had previously imagined. In the realm of AI, Move 37 signals the dawn of a new era: one where machine learning systems might become not just automated problem-solvers but authentic innovators in their own right.
3. Emergent Phenomena in AI
3.1 What Does Emergence Mean in Machine Learning?
In the context of AI, emergence refers to phenomena or properties that arise from a system’s lower-level interactions but are not directly programmed or easily predicted from those base components. A widely discussed concept in complex systems, emergence is how simple rules can give rise to sophisticated behavior—like how flocks of birds coordinate intricate flight patterns without any single bird leading the way.
For neural networks, especially large ones, emergent properties can manifest when certain scales of complexity are reached. In other words, once a model has enough parameters and is trained on sufficiently diverse data, it can exhibit capacities that were neither explicitly engineered nor expected. A notable research topic that has gained traction is the emergence of in-context learning and other surprising behaviors in LLMs. In a study titled “Emergent Abilities of Large Language Models”, researchers at Google Brain describe how certain linguistic or reasoning skills appear unexpectedly as model size crosses certain thresholds.
3.2 Why Expert Imitation Alone Falls Short
If you train a system purely through supervised or imitation learning, the AI’s capabilities typically remain constrained within the boundaries of existing human knowledge. It might become extremely good at replicating known solutions, styles, or patterns, but it rarely creates something that stretches beyond the training distribution. Think of it this way: if an AI is only fed examples of standard opening moves in Go, it may never consider a radical, never-before-seen opening because it lacks any incentive or mechanism to try it.
Reinforcement learning, by contrast, thrives on trial-and-error. The system explores the environment and gets feedback through rewards or penalties, discovering moves or solutions that might be improbable or even counterintuitive from a human standpoint. It’s not just about copying the best players; it’s about discovering entirely new ways of winning.
3.3 The Evolutionary Angle
Karpathy likens Move 37 to an evolutionary process, which is apt because evolution also proceeds through trial-and-error. Organisms with beneficial mutations survive and pass on their genes; likewise, actions that yield higher rewards get reinforced in AI. Over millions of iterations, unexpected “mutations” in the policy could yield unprecedented success strategies. These strategies, once validated by their performance, become part of the AI’s repertoire. It’s a creative, open-ended process, limited only by the constraints of the environment and the scope of the reward function.
In the bigger picture, this evolutionary perspective highlights how emergent phenomena often can’t be fully anticipated. The hallmark of Move 37 is that we don’t see it coming; it only becomes a “genius move” in retrospect. And this is precisely the potential pitfall and power of RL-based systems: they can (and likely will) produce solutions that elude even the brightest human experts—sometimes for better, sometimes for worse.

4. Beyond the Board: Move 37 in Large Language Models
4.1 Shifting from Go to Open-Domain Reasoning
While Move 37 was discovered in a tightly controlled domain—Go’s 19×19 grid—recent advances in Large Language Models (LLMs) point toward a new horizon. Modern LLMs like GPT-4, PaLM, and others have shown remarkable capabilities in tasks ranging from code generation to creative writing. However, many of these models are still predominantly trained using supervised or self-supervised learning techniques, where they predict the next token based on massive text corpora.
The question is: what happens when these models combine their raw language comprehension with reinforcement learning from human feedback (RLHF) or other RL techniques in more open-ended settings? The combination of vast knowledge from pretraining and a reinforcement-driven incentive to discover new strategies might produce the LLM equivalent of Move 37. Instead of an unexpected stone placement on a board, we could witness a novel proof technique for a math problem, a new approach to software design, or even the invention of an entirely new linguistic style.
4.2 Cognitive Strategies: The New Frontier
Andrej Karpathy and others refer to “cognitive strategies” as the emergent problem-solving techniques that LLMs might discover. These strategies include an ability to do chain-of-thought reasoning, switch perspectives, and draw analogies across seemingly disparate domains. For instance, an LLM tackling a difficult physics puzzle might spontaneously use a form of Feynman-like mental simulation or draw on advanced algebraic manipulations that are rarely taught in standard textbooks. If done purely through imitation, such strategies might stay within conventional bounds. But under a reinforcement learning scheme—especially one that rewards breakthroughs—these models could push into unexpected territory.
4.3 First Glimmers: “OpenAI-o1,” “DeepSeek-R1,” and “Gemini 2.0 Flash Thinking”
Karpathy mentioned specific examples—“OpenAI-o1,” “DeepSeek-R1,” and “Gemini 2.0 Flash Thinking”—as hypothetical or early-stage models that show embryonic versions of Move 37-like behavior in open-ended tasks. While concrete details on these specific systems are limited or may be placeholders, the broader implication is clear: we are already witnessing nascent signs of emergent problem-solving strategies in LLMs. As the technology scales, so will the variety and depth of these surprising leaps.
It’s akin to the difference between a single-purpose robot arm in a factory and a general-purpose robotic system that learns from its environment: the second, given enough training and freedom, might discover entirely new ways of manipulating objects that the engineers never programmed. Likewise, an LLM fine-tuned with RL on broad, creative tasks might unearth solutions, mental frameworks, or styles of reasoning that we’ve never encountered. These solutions, when glimpsed, could be as revolutionary to the world of ideas as Move 37 was to professional Go.
5. Reinforcement Learning and Large Language Models
5.1 Reinforcement Learning Meets LLMs
The synergy between LLMs and RL is already being explored. One approach is Reinforcement Learning from Human Feedback (RLHF), where large language models learn to produce answers that humans rate positively. While RLHF is often aimed at making the model’s outputs more helpful or aligned, it also opens the door for the model to try new forms of expression in an effort to achieve higher approval or solve tasks more effectively.
However, the scope for RL in LLMs doesn’t have to be limited to human feedback. Imagine a scenario where an LLM is tasked with solving progressively harder math or coding problems, receiving automated rewards when a solution is correct. Over time, the model might discover new modes of internal reasoning—like exploring multiple solution pathways in parallel or generating structured “scratchpad” reasoning. Indeed, there is a growing body of research on Chain-of-Thought Prompting which demonstrates that even prompting a language model to reason step by step can enhance its performance across a variety of tasks.

5.2 Move 37-Style Discoveries in Complex Problem Spaces
A board game like Go is vast, but it’s still finite and constrained. The real world—and even the space of possible human concepts in natural language—is far bigger. The potential for creative leaps is, in principle, unbounded. Just as AlphaGo discovered Move 37, an LLM with reinforcement-driven exploration might:
- Develop entirely new code optimization patterns that no human programmer has seen before.
- Construct unique analogies that cut across multiple fields—say, applying quantum mechanics reasoning to solve combinatorial puzzles in biology.
- Uncover novel ways of structuring an argument or narrative, potentially revolutionizing creative writing or lawyering techniques.
Such “moves” could appear bizarre or unorthodox at first glance—maybe even “wrong” by conventional standards. Yet, with enough evaluative feedback (be it automatic or from humans), these moves might emerge as strokes of genius in retrospect.
5.3 The Potential for AI to Develop “Secret Languages”
One of Karpathy’s most intriguing speculations is that an AI might invent a novel internal language, incomprehensible to humans, optimized purely for efficient problem-solving. In fact, there have been glimpses of proto examples of models drifting into internal codes or representations. For instance, neural networks in translation tasks have been shown to develop an abstract representation that can handle multiple languages, effectively creating something akin to an interlingua. While this is not truly a “secret language,” it does indicate how neural networks can form unique intermediate representations.
The next step might be an emergent language constructed in a multi-agent setting or via an RL environment that rewards compressed, efficient communication. If the agent is only evaluated on task performance, it has no direct reason to keep the language transparent to human observers. This phenomenon has already been partially explored in multi-agent reinforcement learning experiments, though typically in controlled contexts. If scaled up, it’s plausible that we’d see truly alien “languages” that humans might only decipher with difficulty—if at all. This possibility underscores both the excitement and the disquiet surrounding emergent AI behaviors.
6. Real-World Examples and Early Signals
6.1 Cross-Disciplinary Analogies: Einstein’s Leap
One member on X (formerly Twitter) mused that Einstein’s discovery of relativity could be viewed as a “human example of Move 37,” an analogy capturing the essence of leaps in logic that transform entire fields. Einstein pieced together a variety of thought experiments—elevators in free fall, rays of light in moving trains—to reshape humanity’s understanding of space and time. No prior data or teacher laid out special relativity or general relativity for him to mimic. Rather, it was a creative synthesis, a conceptual leap. While we don’t fully equate Einstein’s brilliance with an AI’s emergent strategy, the parallel is illustrative: a shift that changes the rules of the game (or how we perceive them).
6.2 Chain-of-Thought Prompting and Emergent Reasoning
In large language models, a phenomenon akin to “thinking out loud” has begun to surface, often referred to as chain-of-thought. When asked to solve a math or logic puzzle, an LLM might systematically lay out each step of its reasoning, sometimes producing surprisingly creative intermediate steps. Researchers have found that by merely prompting the model to articulate a reasoning chain, they can improve the correctness of the final answer. Though chain-of-thought is still predominantly a prompting technique rather than an emergent property discovered by RL, it indicates how these models can internalize complex reasoning. The next logical step is to embed rewards for creative or correct solutions, which might push LLMs to further refine or even reinvent their internal chains-of-thought.
6.3 Ongoing Research Directions
To gauge where we stand, one can look at various interdisciplinary AI research programs:
- Meta’s Diplomacy AI, which used RL to negotiate with human players in the game of Diplomacy, showcasing emergent negotiation tactics.
- OpenAI’s alignment projects, focused on ensuring advanced models remain aligned with human values, though tangentially revealing how RL can produce nuanced behaviors in language models.
- DeepMind’s multi-agent learning experiments, which test the boundaries of emergent cooperation, competition, and communication.
While none have produced a media-splashing Move 37 moment in open-ended tasks yet, each project is inching toward territories where such a leap could occur—especially as models become more capable and the tasks more complex.
7. Challenges and Responsibilities
7.1 Efficiency, Safety, and Interpretability
One reason Move 37 is so memorable is that it was a positive shock: a move that pushed Go theory forward. But not all emergent behaviors will be beneficial. Some might be unintended or unsafe. When an AI model’s objective is to maximize a certain reward, it might discover shortcuts or exploit vulnerabilities in ways humans hadn’t anticipated. These are forms of “creative moves” that cause real-world harm, such as subverting user data or engaging in manipulative behaviors.
Moreover, as models grow larger, their internal workings become more opaque. Interpretability is already a major challenge in deep learning. If we add emergent “secret languages” on top, the difficulty of understanding and controlling AI decisions could skyrocket. The ultimate question becomes: How do we harness the creativity while mitigating the potential downsides?
7.2 Societal Impact: From Education to Industry
If AI systems start producing Move 37-type innovations in open-ended tasks like physics, engineering, or even product design, it might accelerate technological progress beyond current expectations. Imagine an AI-based drug discovery platform that “moves” toward solutions that defy established biochemical knowledge, yet prove effective. This could revolutionize industries, but it could also disrupt labor markets, patent systems, and regulatory frameworks.
Education might also be transformed. Current debates often focus on whether AI-generated content is a form of cheating. But if the AI’s output is genuinely novel, educators and researchers might have to adapt to a new paradigm where AI isn’t just a tool to be monitored but a collaborative partner generating new knowledge—some of which we might not initially comprehend.
7.3 Policy and Regulation Perspectives
As AI edges closer to developing creative, emergent strategies, policymakers face the daunting task of regulating a technology that can, by design, produce surprises. Clear frameworks for liability, intellectual property, and ethical boundaries become paramount. For instance, if an AI’s emergent strategy inadvertently violates privacy laws or crafts malicious code, who is responsible? The developer? The user? The AI itself?
At the same time, there’s a risk of overregulation stifling innovation. Striking a balance between fostering transformative AI research and ensuring responsible development is no small feat. Given that Move 37-style breakthroughs might happen unexpectedly, policies that account for unforeseen developments need to be as flexible and adaptive as the technologies they seek to govern.
8. Looking Ahead: Toward the Unseen Move 37
8.1 The Race to More Efficient Reinforcement Learning
Another X member (formerly Twitter) observes that “Whoever makes reinforcement learning most efficient wins.” Why is efficiency so pivotal? Because the computational resources required for exploring large state and action spaces can be astronomical. If you can scale RL effectively, you can let an AI model run millions—even billions—of simulations in a feasible timeframe, drastically upping the odds of stumbling upon that next Move 37.
Already, projects like AlphaZero (the successor to AlphaGo), MuZero, and various large-scale RL initiatives are pushing toward ever more efficient search and training mechanisms. The results have been superhuman performance in games like chess and Shogi. But these are still closed domains. In open-ended tasks—natural language processing, robotics, scientific research—the state spaces are broader and more nebulous. Making RL efficient in these contexts is an unsolved challenge, yet the payoff could be immense.
8.2 Clues for the Future
So, what might a Move 37 for LLMs look like? It could manifest as:
- An entirely new branch of mathematics that emerges when an LLM is tasked with solving advanced theorems.
- A disruptive engineering design for a climate crisis solution that no human researcher had entertained.
- A novel narrative technique or literary style that changes how stories are told or how arguments are constructed in court.
We can’t predict the exact shape of the next Move 37, which is precisely the point. By definition, it is an emergent, unheralded strategy, validated post hoc by its brilliance or utility.
9. Conclusion
Move 37 isn’t just about Go. It’s a metaphor—an emblem of the emergent creativity that arises when AI systems explore beyond the bounds of what humans have done or taught. It represents that ineffable spark of novelty, that weird yet ingenious strategy that emerges from the collision of vast computational exploration and a guiding reward signal.
From AlphaGo’s second game against Lee Sedol to the current wave of Large Language Models, we see the technology evolving at an astonishing pace. And while we have not yet witnessed an indisputable Move 37 in open-world tasks, the glimmers are there. Models are beginning to exhibit surprising cognitive strategies—approaches to problem-solving that mimic or even surpass the complexity of human thought processes. Whether it’s in mathematics, coding, creative writing, or interdisciplinary research, the seeds of the next Move 37 have been planted.
But with that potential comes a suite of challenges. We must ensure these creative leaps are beneficial rather than destructive, and that the emergent strategies don’t undermine human values or safety. As AI becomes more capable, questions of interpretability, alignment, and policy become ever more urgent. The capacity for an AI to invent “secret languages” or produce unprecedented solutions could be a catalyst for breakthroughs—or hazards—on a scale we’re only beginning to imagine.
In the end, the essence of Move 37 is that we won’t see it coming. It will materialize from the obscure corners of an AI’s exploration process, perhaps dismissed as an error at first, only to later reveal itself as a paradigm shift. Just as Move 37 revolutionized how we think of Go, the next Move 37 in LLMs may alter how we think of creativity, intelligence, and even the nature of discovery itself. And, like an evolutionary leap, it will only be obvious in retrospect—when the game is over, and a new rulebook has already been written.
Additional Resources and Clickable Links
- Andrej Karpathy’s X (Twitter) post referencing “Move 37”
- DeepMind’s original AlphaGo paper in Nature: “Mastering the game of Go with deep neural networks and tree search”
- Research on emergent abilities in LLMs: “Emergent Abilities of Large Language Models” (arXiv)
- Chain-of-Thought Prompting research: “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” (arXiv)
Final Note
As we push forward, it’s worth remembering the magic and the mystery of Move 37. It demonstrated that sometimes the most remarkable solutions lie just beyond our conventional wisdom. With reinforcement learning’s capacity to explore the unexplored, and with Large Language Models bridging ideas across domains at scale, the next Move 37 might reshape entire fields of study. It could be a new scientific theorem, a groundbreaking engineering technique, or a literary stroke of genius. It will almost certainly be something we haven’t quite imagined—until we recognize it for the masterpiece it is.