Since Alan Turing proposed the Turing Test in 1950, artificial intelligence (AI) has been guided by a bold vision: machines capable of performing the wide range of intellectual tasks that humans do. Over the decades, several waves of progress have been celebrated as steps toward this goal—from expert systems in the 1980s, to deep learning in the 2010s, and most recently large language models. Each wave has delivered remarkable capabilities. Yet each has also revealed a continuing gap between high performance in specific domains and the broader flexibility implied by Turing’s vision.
The Turing Test ultimately reflects what may be called Open-world AI: the ability to handle unfamiliar tasks, adapt to new situations, and learn continuously from limited experience. In contrast, learning in most modern AI systems is defined as optimizing performance on a fixed task within a well-defined environment and under clear success criteria1 — what can be described as Closed-world AI.
In closed-world AI, engineers improve performance by analyzing system failures, collecting additional related examples, and retraining the model. This iterative process has driven much of the rapid progress of recent years. However, it does not fully address the deeper challenge of open-world intelligence. In real-world environments, tasks are not always specified in advance, examples may be scarce, and new situations arise continually.
This contrast leads to a striking phenomenon in modern AI. On one hand, today’s leading models can write essays, generate code, recognize images, and assist with complex professional work. On the other hand, they may exhibit unexpected failures on simple but unfamiliar tasks. For example, miscounting the number of letters (such as “r”) in a common word like strawberry.2 Such inconsistencies reveal the difference between broad statistical competence and robust, adaptive intelligence.
Current closed-world AI systems also remain largely static. They undergo a single, computationally expensive training process and then change only through costly fine-tuning, which can degrade or overwrite previously acquired knowledge. As a result, today’s systems struggle to adapt to newly arising situations, to personalize effectively for diverse needs, or to operate reliably in dynamic environments.
This dissertation addresses a central question:
How can we design large-scale AI systems that continuously learn new and unforeseen tasks from limited data and minimal human guidance — moving closer to the open-world intelligence envisioned by Turing?
To guide the design of open-world AI, this dissertation proposes three learning principles:
Rich features. AI systems must develop diverse and expressive internal representations — analogous to a large toolbox — so that knowledge learned in one context can be reused in others.
Disentangled organization. These features must be structured and modular, like a well-organized toolbox, allowing the system to update specific capabilities without disrupting existing knowledge.
Inference-time learning. Learning should occur not only during initial training but also during deployment. This capability allows the system to incorporate new information dynamically instead of remaining static.
Based on these principles, the dissertation introduces a practical large-scale architecture called Memory Mosaics, designed to enable continuous learning and to move beyond the limitations of current large language model training paradigms.
A major contribution of this work is demonstrating that open-world learning principles operate at the scale of modern foundation models, rather than only in small experimental settings.
The Memory Mosaics system was trained with:
approximately 10 billion parameters
trillions of training tokens
thousands of state-of-the-art GPUs
millions-dollar computational investment
These large-scale experiments show that the proposed architecture is practical and scalable. Empirically, Memory Mosaics achieves substantial improvements over comparable transformer-based foundation models, validating the effectiveness of the proposed learning principles for open-world AI.
Main Outcomes
1. Learning principles for future foundation models
The dissertation identifies three core principles — rich representations, structured knowledge organization, and inference-time learning — that provide a conceptual foundation for building open-world AI systems capable of continuously learning diverse tasks.
2. A practical AI model that outperforms modern foundation models
The proposed Memory Mosaics architecture implements these principles at scale. Across multiple evaluations, it achieves performance improvements exceeding 10% absolute over comparable large transformer models in the learning of new tasks.
3. A path toward adaptive and decentralized AI
Because knowledge can be updated incrementally rather than through full retraining, the approach enables AI systems that evolve over time. This capability supports personalized models, long-term adaptation, and deployment outside large centralized modern AI systems.
4. Research Recognition
A central component of this work, Memory Mosaics at Scale, was selected for an oral presentation at NeurIPS 2025, one of the most competitive international conferences in artificial intelligence. Oral presentations are awarded to a small fraction of submissions (0.36%) and are presented to an audience of approximately 20,000 researchers and practitioners.
Broader Significance
The broader impact of this work lies in redefining the trajectory of artificial intelligence.
Technically, it opens a new research direction focused on lifelong learning, modular knowledge, and inference-time learning. Rather than scaling static models, this paradigm emphasizes systems that grow and improve through experience.
Economically and environmentally, continuous learning reduces the need for repeated large-scale retraining, lowering long-term computational costs and energy consumption.
Societally, the architecture enables personalized and decentralized AI. Individuals and organizations can maintain models that adapt to their own data and evolving needs while reducing reliance on centralized infrastructure.
Most fundamentally, this dissertation reframes the goal of AI. Instead of building systems that excel at predefined tasks, it advances a framework for systems that remain useful as tasks change.
Personal Statement
My dissertation is motivated by a question that has shaped artificial intelligence since Alan Turing: how can we build machines that keep learning after deployment —adapting to new situations, new information, and new user needs — rather than remaining essentially fixed after a single training run? I refer to this goal as Open-world AI. Modern foundation models have achieved impressive breadth, but the dominant paradigm still follows a largely closed-world workflow: define tasks, collect large amounts of related data, train at scale, and later update behavior through costly fine-tuning or retraining. This workflow is increasingly mismatched to real deployments, where requirements shift continuously and where personalized, long-lived systems are needed.
The central contribution of my dissertation is to propose learning principles and a practical architecture for open-world AI. Concretely, I develop three principles: rich & reusable features, disentangled organization of knowledge, and inference-time learning. I operationalize these principles through the Memory Mosaics architecture. A defining aspect of this work is that these ideas are not presented only as conceptual arguments: I tested them under realistic foundation-model conditions, including training a 10B-parameter model on 1 trillion tokens using large-scale GPU clusters, and I conducted extensive evaluations demonstrating clear improvements over comparable transformer-based baselines. This combination of principled design and large-scale validation is intended to make Memory Mosaics a practical direction for open-world AI rather than an exclusively small-scale research topic.
This research also shaped my scholarly perspective. It convinced me that the next stage of progress in AI will depend less on scaling static pretraining alone and more on architectures and learning paradigms that support stable accumulation of knowledge over time — systems that can improve through use, not only through periodic retraining. It also reinforced the importance of evaluating AI as an evolving system operating under distribution shift and scarce examples, rather than as a one-time predictor optimized for fixed benchmarks.
A distinguishing feature of this dissertation is how it was executed. Unlike many foundation-model efforts (e.g., Llama, Gemini, GPT) that rely on large teams spanning data, infrastructure, training, and evaluation, this work — including the large-scale training — was carried out almost entirely by two people: my advisor (Léon Bottou) and me. I led the formulation of the learning principles, designed the architecture and training methodology, implemented the system end-to-end, ran and debugged large-scale experiments, performed the analyses and evaluations, and wrote the dissertation and associated papers. My advisor provided strategic guidance, critical feedback, and technical critique throughout. This unusually small-team setting required building the full research stack—from ideas to systems to scale—and it strengthened my commitment to research that is both conceptually grounded and empirically decisive.
Footnotes
These modern AI systems follow the I.I.D. assumption formalized in Vladimir Vapnik’s statistical learning framework in the 1990s. Interestingly, Vapnik himself warned over 30 years ago that this assumption would eventually “come back to haunt us” in real-world problems.
Today’s frontier models can now correctly count the “r”s in strawberry. But how they were fixed reveals the core limitation of closed-world AI: “In closed-world AI, engineers improve performance by analyzing system failures, collecting additional related examples, and …”