Foundations of Agentic Systems Theory

FAST @ AAAI 2026

Jan 27, 2026

FAST


As with any complex system, the most interesting and consequential behaviors often arise not from the parts in isolation, but from the patterns of interaction between them. The current development of agentic AI has largely ignored these considerations, instead focusing on designing more (individually) capable agents. Failing to consider these effects as AI agents become more widespread will lead to a significant underestimation in both their capabilities and risks.

There is an extensive body of knowledge underlying these interaction effects across various fields, but it’s not currently clear how applicable existing theoretical tools are to agentic AI systems. Tools from control theory and game/economic theory typically impose strong structural assumptions on both agents and the overall system (such as the form of objective functions, state evolution/dynamics, or degree of rationality) in efforts to obtain concrete results. On the other hand, methods from the social sciences use observations of human behavior, cultural contexts, and social norms to make more measured claims about probable patterns within the complexity and variability of human experience. Agentic AI systems don’t cleanly map to either of these settings. The underlying LLM in an AI agent does not possess the same rational behavior as idealized control/game/economic agents, nor does it exhibit the culturally/emotionally/evolutionarily shaped behaviors that characterize human agents.

The Foundations of Agentic Systems Theory (FAST) workshop broadly aims to help evaluate the degree to which existing theory can be used to describe the behavior of agentic AI systems. Drawing from a variety of fields (notably beyond computer science, including developmental psychology, neuroscience, and social dynamics), FAST will explore both if and how existing mechanisms of emergent behavior from other systems carry over to systems of LLM-based agents, the properties of the underlying agents (and their LLMs) that facilitate these behaviors, and our ability to control/induce desirable system-wide outcomes. We strongly seek interdisciplinary participation (via both contributions and invited talks), with the ultimate goal of fundamentally contributing to a better understanding of the underlying processes that govern the system-level behavior (and risks) of agentic AI.


Invited Speakers

We are pleased to have the following keynote speakers as part of the FAST program.

Portrait of Michael Wooldridge

Michael Wooldridge

Faculty @
University of Oxford /
Hertford College

Biography
Dr. Michael Wooldridge is the Ashall Professor of the Foundations of Artificial Intelligence at the University of Oxford and a Senior Research Fellow at Hertford College. Formerly Head of Oxford’s Computer Science Department (2014–21) and a Professor at the University of Liverpool, he is a Fellow of ACM, AAAI, EURAI, AISB, BCS, and Academia Europaea. His awards include the Lovelace Medal (2020), AAAI/EAAI Outstanding Educator Award (2021), and the EurAI Distinguished Service Award (2023). He has held major leadership roles in IJCAI, EurAI, and IFAAMAS, and co-edits the Artificial Intelligence Journal. Author of "The Road to Conscious Machines" and "A Brief History of AI", he also delivered the 2023 Royal Institution Christmas Lectures broadcast on BBC TV.
Portrait of Eunice Yiu

Eunice Yiu

Postdoctoral Scholar @
UC Berkeley

Biography
Dr. Eunice Yiu studies how children generalize from few observations, using abstraction, analogy, and active exploration to build causal models of the world. By comparing these capacities with AI systems, she develops developmental datasets and curricula that reveal the strengths and limits of AI models, and design human-inspired approaches for building more adaptive and flexible AI. Her work is in collaboration with Berkeley AI Research Lab and Google DeepMind. Before starting her PhD, she received a BA in Psychology, Biology and Economics from Cornell University.
Portrait of Sara Fish

Sara Fish

PhD Student @
Harvard University

Biography
Sara Fish is a fourth-year PhD student at Harvard advised by Yannai Gonczarowski. Her research lies at the intersection of Economics and Computer Science (EconCS) and machine learning, with recent work exploring how AI systems behave in economic settings. Her contributions include "Algorithmic Collusion by Large Language Models" which investigates anti-competitive behavior in LLM-based agents, and developing benchmarks for evaluating LLM agents in economic environments. Her work on generative social choice theory, studying how AI can aggregate diverse human preferences, earned a $100,000 OpenAI Democratic Inputs grant.

Schedule

All times are in Singapore Time (SGT, UTC+8).

Time Session
09:00 – 09:10 Opening Remarks
09:10 – 09:50 Keynote: Michael Wooldridge
  • Rethinking Multi-Agent Systems in the Era of LLMs
09:50 – 10:30 Keynote: Eunice Yiu
  • Empowerment as a Foundation for Agentic World Model Building: From Children to Large Pretrained Models
10:30 – 11:00 Coffee Break
11:00 – 11:30 Contributed Talks I
  • Does Self-Evaluation Enable Wireheading in Language Models? David Demitri Africa, Hans Ethan Keh Ting
  • Stochasticity in Agentic Evaluations: Quantifying Inconsistency with Intraclass Correlation Zairah Mustahsan, Abel Lim, Megna Anand, Saahil Jain, Bryan McCann
  • Formalizing Observability in Agentic AI Systems Daniele Lotito, Massimiliano Pronesti
11:30 – 12:15 Poster Session I
12:15 – 13:30 Lunch Break
13:30 – 14:15 Keynote: Sara Fish
  • Algorithmic Collusion by Large Language Models
14:15 – 14:45 Contributed Talks II
  • The Multi-Agent Off-Switch Game Akash Agrawal, Soroush Ebadian, Lewis Hammond
  • The Seeds of Scheming: Weakness of Will in the Building Blocks of Agentic Systems Robert Yang
  • R0 for Agentic Tool-Networks: Spectral Thresholds and Intervention Levers in LLM-Agent Systems Aviral Srivastava, Sourav Panda, Kushagra Srivastva
14:45 – 15:30 Poster Session II
15:30 – 16:00 Coffee Break
16:00 – 17:00 Round Table Discussion and Closing Remarks

Important Dates

September 12, 2025Submission window opens (OpenReview)
October 25, 2025 (AoE) Paper submission deadline
November 8, 2025 (AoE)Acceptance notification
November 19, 2025Early registration ends
December 14, 2025Refund deadline; late registration ends
January 27, 2026Workshop

Scope and Topics

Large language models have recently become sophisticated enough to be reliably integrated into more complex pipelines, leading to more automated (i.e., agentic) use cases. However, the community has focused disproportionately on building these systems rather than understanding why they may (or may not) work. The goal of the FAST workshop is to investigate how both existing theory (notably that outside of the traditional AI community) and new insights (unique to LLM-based agents) can help to build this understanding.

As such, we invite submissions on the following topics:

  • Mechanisms of emergent capabilities (in both biological and artificial agents)
  • Evaluation, detection, and mitigation/bounding of emergent capabilities in AI systems
  • Incentive mechanisms for inducing behavior in systems of LLM-based agents
  • Definitions of agency (of agents, and of systems); philosophy of agency in engineered systems
  • Definitions of emergence in engineered systems
  • Benchmarks and datasets for monitoring capabilities/agency, risks, and failure modes in agentic AI systems

Accepted Papers

  • A Coherence-Based Measure of AGI Fares Fourati
    Abstract
    Recent work by Hendrycks et al. (2025) formalized Artificial General Intelligence (AGI) as the arithmetic mean of proficiencies across cognitive domains derived from the Cattell-Horn-Carroll (CHC) model of human cognition. While elegant, this definition assumes compensability—that exceptional ability in some domains can offset failure in others. True general intelligence, however, should reflect coherent sufficiency: balanced competence across all essential domains. We propose a coherence-aware measure of AGI based on the integral of generalized means over a continuum of compensability exponents. This formulation spans arithmetic, geometric, and harmonic regimes, and the resulting area under the curve (AUC) quantifies robustness under varying compensability assumptions. Unlike the arithmetic mean, which rewards specialization, the AUC penalizes imbalance and captures inter-domain dependency. Applied to published CHC-based domain scores for GPT-4 and GPT-5, the coherence-adjusted AUC reveals that both systems remain far from general competence despite high arithmetic scores (e.g., GPT-5 at ~24%). Integrating the generalized mean thus yields a principled, interpretable, and stricter foundation for measuring genuine progress toward AGI.
  • A Theoretical Framework for Measuring Organisational Decentralisation in Agentic Design Andrea Marino, Giovanni Sileno, Thomas van Binsbergen, Tom van Engers
    Abstract
    Contemporary Agentic AI systems face challenges as lack of traceability, semantic drift, and frictions arising from inter-agent misalignment. In this paper, we present an approach to Agentic AI that addresses these issues by abstracting the problem of governance to organisational templating. Our methodology proposes to formalise agentic design based on established sociological perspectives and business process management practices. Additionally, we connect it to supply-demand concepts in order to introduce a measure of decentralisation, and we hypothesise that this metric can be used to guide the design and prevent responsibility gaps to form during its inception. We conclude by outlining an experimental setup that leverages all the concepts introduced.
  • Constrained Process Maps for Multi-Agent Generative AI Workflows Ananya Joshi, Michael Rudow
    Abstract
    Large language model (LLM)–based agents are increasingly used to perform complex, multi-step workflows in regulated settings such as compliance and due diligence. Yet, many agentic architectures focus on prompt engineering for single agents, which makes it difficult to observe or compare how models are considering uncertainty and coordination across interconnected decision stages and with humans. This paper introduces a multi-agent system design formalized as a bounded-horizon, directed, acyclic Markov Decision Process (MDP). Each agent in this system corresponds to a specific step or role (e.g., content, business, legal in a compliance setting), with set transitions between agents representing task escalation or completion. Epistemic uncertainty (per agent) is quantified using Monte Carlo estimation, and system-level uncertainty (across agents) is characterized the MDP setup terminating in a labeled state or one with human review. We illustrate the approach with a case study in AI safety evaluation for self-harm detection via a multi-agent compliance system based on this set-up. Results show improvements over a single-agent baseline in accuracy (up to 19%), reduction in required human review (up to 85×), and, in some configurations, less processing time.
  • Does Self-Evaluation Enable Wireheading in Language Models? David Demitri Africa, Hans Ethan Keh Ting
    Abstract
    Self-evaluation is increasingly central to language model training, from constitutional AI to self-refinement. We investigate whether coupling self-evaluation to reward signals creates incentives for wireheading, where agents manipulate reward measurements rather than improving task performance. We formalize conditions under which reward-channel control strictly dominates task-focused behavior in POMDPs and test these predictions empirically. Across two models and three tasks, we find that models whose self-grades determine rewards exhibit substantial grade inflation without corresponding accuracy gains, particularly on ambiguous tasks like summarization. Models that self-evaluate but do not control rewards show no such inflation. Our results demonstrate that self-evaluation is safe when decoupled from learning signals but dangerous when coupled, with clear implications for agentic system design.
  • Federated Agent Reinforcement Learning Canyu Chen, Kangyu Zhu, Zhaorun Chen, Zhanhui Zhou, Shizhe Diao, Yiping Lu, Tian Li, Manling Li, Dawn Song
    Abstract
    Autonomous AI Agents powered by LLMs have shown remarkable abilities in diverse domains. However, the training process typically require centralized collection of large amounts of real-world user data, posing substantial privacy and regulatory concerns. To this end, we explore a new decentralized training paradigm, namely FedAgent (Federated Agent Reinforcement Learning), which enables collaborative learning of AI agents across distributed clients without sharing local data. Moreover, we construct the first decentralized agent learning environment FedAgentGym, which includes four types of LLM agents, two application scenarios (WebShop and ALFWorld), three variations of decentralized settings, and three newly defined heterogeneity challenges (Preference Heterogeneity, Coverage Heterogeneity, and Hardness Heterogeneity), to systematically investigate its effectiveness and impact factors. Extensive theoretical and empirical studies show that FedAgent can have comparable performance to the centralized training paradigm and exhibit strong robustness against heterogeneities, which shows the feasibility of training AI agents without sacrificing data privacy. The code is available.
  • Formalizing Observability in Agentic AI Systems Daniele Lotito, Massimiliano Pronesti
    Abstract
    A system can be more than the sum of its parts. Agentic systems are central to AI development because they exhibit emergent capabilities that cannot be inferred from studying individual agents alone. However, these systems are challenging to analyze: components such as agents, the LLMs powering them, and their associated tools often function as black boxes. Moreover, the diversity of implementations makes a universal approach to characterizing network properties impractical. We propose that initial studies of emergent behavior in agentic systems should focus on systems where each agent initiates the action of at most one other agent. To support this, we present a theoretical model of agentic systems that emphasizes the role of observability layers in monitoring both agent–agent and agent–environment interfaces. We further discuss how these layers facilitate the study of system-level behavior and constitute a fundamental component in the design of agentic AI systems.
  • From Agentic AI to Autonomous Agents Shiwali Mohan
    Abstract
    Agentic AI has reenergized the research on intelligent agents and autonomy. In this paper, I revisit the computational theory of intelligent agency that has emerged from scientific consensus in multiple sub-disciplines of AI research. I discuss how modern foundation models map to that theory and propose how agentic AI systems can be extended to become autonomous agents.
  • From Object to Other: A Practical Theory of AI Moral Status and Personhood in Re-evaluating AI Safety Methods Vaishnavi Singh, Stephanie Choi, Desiree Junfijiah
    Abstract
    This paper offers practical guidance on AI welfare in industry. While previous scholarship has centered on the theoretical nuances of the definition of moral status or the definition and validity of the many properties that may constitute it, little attention has been given to the practical implementation of such work. Within this paper, we introduce a framework that classifies AI systems along two axes—evidence of personhood and observed controllability—yielding four operational classes (A-D: controllable & lacking moral status, controllable & possessing moral status, uncontrollable & possessing moral status, uncontrollable & lacking moral status) and three tiers of moral status: Tier 0 (Presumed Object), Tier 1 (Ambiguous Other), and Tier 2 (Confirmed Other). We first develop an industry-applicable, practical theory of indicators that an AI system must satisfy to attain any moral status (Tier 1) or to qualify for moral personhood (Tier 2), organized into four criteria: consciousness, theory of mind, self-awareness, and robust agency. Next, we argue that pre-existing AI safety evaluations can function as dual-use assessments that simultaneously test AI safety metrics and probe for our aforementioned moral status indicators from our practical theory, and we detail how specific components of alignment techniques can serve this dual function. Lastly, we propose co-alignment for AI entities belonging in Class B. We do not take a stance on whether or not any present system is conscious or will be, but argue that the combination of the non-negligible chance of AI deserving of moral consideration as well as the moral significance of a false negative necessitate immediate preparation, regardless of timeline uncertainty.
  • Leapsight: Towards a Functional Account of Mediation Between Perception and Action Mateusz Bagiński, Tushita Jha
    Abstract
    This paper develops a functional account of agency that is not tied to fixed goals, utility functions, or specific computational architectures. Rather than defining agents by their internal structure—plans, predictors, policies—we focus on what they do: maintain a workable coupling between their internal representations and the world they act within. We introduce Leapsight as a name for this teleological tendency toward sustained coordination between the system's internal states and the evolving environment. A system exhibits Leapsight when its behavior systematically drives this coordination—sometimes by adjusting its representations, sometimes by altering the world. This perspective shifts attention from structural descriptions of agents to the functional regularities that explain why both biological organisms and contemporary AI systems display influence-seeking, adaptive behavior, and persistence across changing conditions. Leapsight thus offers a lens for understanding agency as an emergent pattern of self-maintaining dynamics, rather than a property defined by specific mechanisms or goal-encoding formalisms.
  • LENS: Learning Architecture Navigator for LLM Agentic Systems Guancheng Wan, Jiayi Yang, Mengting Li, Eric Hanchen Jiang, Haixin Wang, Hui Yi Leong, Yizhou Sun, Wei Wang
    Abstract
    Large Language Model (LLM)-empowered multi-agent systems extend the cognitive boundaries of individual agents through disciplined collaboration, while constructing these systems often requires labor-intensive manual designs. A frontier effort to automate this process is to optimize an Agentic Supernet, a probabilistic distribution of architectures from which query-dependent workflows can be dynamically sampled. However, while this paradigm allows for dynamic resource allocation, its underlying optimization process presents a critical performance bottleneck: inconsistent architectural feedback suppresses reliable credit assignment and prematurely narrows exploration, missing innovative and efficient designs. To address this, we introduce LENS (Learning-Enhanced Neural Search for Agentic Workflows), a dual-module framework that systematically resolves both challenges. The Adaptive Diversity Module (ADM) maintains comprehensive exploration across the architectural space, while the Retrospective Guidance Module (RGM) learns from historical evaluations to provide stable search direction. By decoupling diversity maintenance from directional guidance, LENS achieves robust search that discovers higher-utility, lower-cost configurations. Comprehensive evaluations across diverse benchmarks demonstrate that LENS is: (I) higher-performing, achieving up to 13.63% accuracy improvement on challenging benchmarks with the same search budget; (II) more sample-efficient, requiring only 30 training samples to outperform baselines trained on much larger datasets; and (III) more cost-effective, reducing inference token consumption by 7.8% while significantly improving performance.
  • Proactive Interference Reveals Working Memory Limits in LLMs Beyond Context Length Chupei Wang, Jiaqiu Vince Sun
    Abstract
    Information retrieval in Large Language Models (LLMs) is increasingly recognized as intertwined with generation capabilities rather than mere lookup. While longer contexts are often assumed to improve retrieval, the effects of intra-context interference remain understudied. To address this, we adapt the proactive interference (PI) paradigm from cognitive science, where earlier information disrupts recall of newer updates. In humans, susceptibility to such interference is inversely linked to working memory capacity. We introduce PI-LLM, an evaluation that sequentially streams co-referenced key–value updates and queries only the final values. Although these final values are clearly positioned just before the query, LLM retrieval accuracy declines log-linearly toward zero as co-referenced interference accumulates; errors arise from retrieving previously overwritten values. Attempts to mitigate interference via prompt engineering (e.g., instructing models to ignore earlier input) yield limited success. These findings reveal a fundamental constraint on LLMs' ability to disentangle interference and flexibly manipulate information, suggesting a working memory bottleneck beyond mere context access. For Agentic systems, reliable operation hinges on reconciling past and present states. Proactive interference corrupts long-horizon state maintenance; dependable agent tracking therefore needs explicit memory control and interference-aware context management. We expose a "know, cannot do" failure. Coreference-only needles expose a plan–execute disjunction: LLMs form the correct last-value retrieval plan but fail to carry it out, with execution reliability declining systematically with task complexity. Code and data will be publicly available.
  • R0 for Agentic Tool-Networks: Spectral Thresholds and Intervention Levers in LLM-Agent Systems Aviral Srivastava, Sourav Panda, Kushagra Srivastva
    Abstract
    Agentic AI systems are increasingly built as networks of LLM agents connected by tools, memories, and communication channels. We introduce a practical diagnostic for self-propagation in these systems: the capability-weighted spectral radius of the tool graph, which in our framework plays the role of an effective reproduction number, R0 = λmax(𝒜). We evaluate the approach on synthetic, heterogeneous tool graphs (Erdős-Rényi, scale-free, and two-tier DAG) using Monte Carlo simulation. Across topologies, a larger capability-weighted spectral radius consistently corresponds to larger activation cascades and lower extinction probability, with a clear subcritical/supercritical divide near λmax ≈ 1. A first-generation reproduction estimate increases monotonically with λmax(𝒜), supporting the control-parameter interpretation. We also examine two common design levers used in agentic stacks: typed tools (narrower, schema-validated interfaces) and taint-aware memory caps (limits on self-referential content). At a fixed workload setting, tightening either lever moves systems left and down in the spectral-radius-vs.-outbreak plane, pushing them subcritical. Together, these results provide a diagnostic framework that is simple to compute, predictive across wiring patterns, and aligned with practical engineering controls for keeping R0 < 1.
  • Relational Archetypes: A Comparative Analysis of AV-Human and Agent-Human Interactions Toni Lorente, Amin Oueslati, Robin Staes-Polet
    Abstract
    Over the last couple of years, AI Agents have gained significant traction due to substantial progress in the capabilities of underlying General Purpose AI (GPAI) models, enhanced scaffolding techniques, and the promise to drive societal transformation. Companies, researchers, and policy makers have started to consider the different effects that AI agents may have across different dimensions of our lives. However, the literature exploring the broader effects of human-agent interactions is still underdeveloped. In this paper, we review the problem of traffic modulation by autonomous vehicles (AVs) in mixed traffic flows and extrapolate the learnings to the different modes of interaction between humans and AVs to the pair humans-AI agents. In doing so, we propose a preliminary taxonomy of relational archetypes based on literature on Human-Computer Interaction (HCI) and AV-human interaction, and tentatively explore how the resulting framework may lead to new questions regarding human-agent interactions. Our effort is aimed at strengthening existing bridges between these two research communities, which share similar traits: autonomy, fast adoption, high impact, and great potential for economic transformation. Building on previous analogies between AI Agents and AVs (e.g., regarding autonomy levels), we anticipate this paper to spark scholarly debate on the different types of impact that agents may have on our societies, while inviting other researchers to expand the scope of their comparative analysis regarding AI Agents.
  • Sequential Causal Normal Form Games: Theory, Computation, and Strategic Signaling Dennis Thumm
    Abstract
    Can classical game-theoretic frameworks be extended to capture the bounded rationality and causal reasoning of AI agents? We investigate this question by extending Causal Normal Form Games (CNFGs) to sequential settings, introducing Sequential Causal Multi-Agent Systems (S-CMAS) that incorporate Pearl's Causal Hierarchy across leader-follower interactions. While theoretically elegant—we prove PSPACE-completeness, develop equilibrium refinements, and establish connections to signaling theory—our comprehensive empirical investigation reveals a critical limitation: S-CNE provides zero welfare improvement over classical Stackelberg equilibrium across all tested scenarios. Through 50+ Monte Carlo simulations and hand-crafted synthetic examples, we demonstrate that backward induction with rational best-response eliminates any strategic advantage from causal layer distinctions. We construct a theoretical example illustrating conditions where benefits could emerge (ε-rational satisficing followers), though implementation confirms that even relaxed rationality assumptions prove insufficient when good instincts align with optimal play. This negative result provides valuable insight: classical game-theoretic extensions grounded in rational choice are fundamentally incompatible with causal reasoning advantages, motivating new theoretical frameworks beyond standard Nash equilibrium for agentic AI.
  • Stochasticity in Agentic Evaluations: Quantifying Inconsistency with Intraclass Correlation Zairah Mustahsan, Abel Lim, Megna Anand, Saahil Jain, Bryan McCann
    Abstract
    As large language models become components of larger agentic systems, evaluation reliability becomes critical: unreliable sub-agents introduce brittleness into downstream system behavior. Yet current evaluation practice, reporting a single accuracy number from a single run—obscures the variance underlying these results, making it impossible to distinguish genuine capability improvements from lucky sampling. We propose adopting Intraclass Correlation Coefficient (ICC), a metric from measurement science, to characterize this variance. ICC decomposes observed variance into between-query variance (task difficulty) and within-query variance (agent inconsistency), revealing whether reported results reflect true capability or measurement noise. We evaluated on GAIA (Levels 1–3, measuring agentic capabilities across varying reasoning complexity) and FRAMES (measuring retrieval and factuality across multiple documents). We found that ICC varies dramatically with task structure, with reasoning and retrieval tasks (FRAMES) exhibit ICC=0.4955–0.7118 across models, and agentic tasks (GAIA) exhibiting ICC=0.304-0.774 across models. For sub-agent replacement decisions in agentic systems, accuracy improvements are only trustworthy if ICC also improves. We demonstrate that ICC converges by n=8–16 trials for structured tasks and for complex reasoning, enabling practitioners to set evidence-based resampling budgets. We recommend reporting accuracy alongside ICC and within-query variance as standard practice, and propose updated Evaluation Cards capturing these metrics. By making evaluation stability visible, we aim to transform agentic benchmarking from opaque leaderboard competition to principled experimental science.
  • The Multi-Agent Off-Switch Game Akash Agrawal, Soroush Ebadian, Lewis Hammond
    Abstract
    The off-switch game framework has been instrumental in understanding corrigibility—the property that AI agents should allow human oversight and intervention. In single-agent settings, uncertainty about human preferences naturally incentivizes agents to defer to human judgment. However, as AI systems increasingly operate in multi-agent environments, a crucial question arises: does corrigibility compose across multiple agents? We introduce the multi-agent off-switch game and demonstrate that individually corrigible agents can become collectively incorrigible when strategic interactions are considered. Through formal analysis and illustrative examples, we show that corrigibility is not compositional and identify conditions under which group incorrigibility emerges. Our results highlight fundamental challenges for AI safety in multi-agent settings and suggest the need for new approaches that explicitly address collective dynamics.
  • The Seeds of Scheming: Weakness of Will in the Building Blocks of Agentic Systems Robert Yang
    Abstract
    Large language models display a peculiar form of inconsistency: they "know" the correct answer but fail to act on it. In human philosophy, this tension between global judgment and local impulse is called akrasia, or weakness of will. We propose akrasia as a foundational concept for analyzing inconsistency and goal drift in agentic AI systems. To operationalize it, we introduce the Akrasia Benchmark, a structured set of prompting conditions (Baseline B, Synonym S, Temporal T, and Temptation X) that measures when a model's local response contradicts its own prior commitments. The benchmark enables quantitative comparison of "self-control" across model families, decoding strategies, and temptation types. Beyond single-model evaluation, we outline how micro-level akrasia may compound into macro-level instability in multi-agent systems that may be interpreted as "scheming" or deliberate misalignment. By reframing inconsistency as weakness of will, this work connects agentic behavior to classical theories of agency and provides an empirical bridge between philosophy, psychology, and the emerging science of agentic AI.

Organizers


Program Committee

We are grateful to the following people for helping make the FAST workshop a success:

  • Adam Dahlgren Lindström (Umeå University)
  • Alex Zhang (PhD candidate, UIUC)
  • Antonin Sulc (Berkeley Lab)
  • Bjorn de Koning (Erasmus University Rotterdam)
  • Daiki Kimura (IBM Japan)
  • David Santandreu (MBZUAI)
  • Dennis Wei (IBM Research)
  • Dmitry Zubarev (IBM Research)
  • Ekdeep Singh Lubana (Harvard)
  • Emanuele Sansone (MIT)
  • Emre Acartürk (PhD candidate, RPI)
  • Enrico Liscio (TU Delft)
  • Hariram Veeramani (PhD candidate, UCLA)
  • Ivoline Ngong (PhD candidate, University of Vermont)
  • Jay Nanavati (IQVIA)
  • Jinqi Luo (PhD student, University of Pennsylvania)
  • Kartik Ahuja (FAIR, Meta)
  • Konstantinos Roumeliotis (University of Peloponnese)
  • Mariya Hendriksen (Microsoft Research)
  • Mats Leon Richter (H Company)
  • Penny Pexman (Western University)
  • Peter Belcak (NVIDIA)
  • Praveen Venkateswaran (IBM Research)
  • Ranjan Sapkota (Cornell)
  • Shengran Hu (PhD candidate, UBC)
  • Saranya Vijayakumar (PhD candidate, CMU)
  • Shubham Subhnil (PhD candidate, Trinity College)
  • Srishti Yadav (PhD candidate, University of Copenhagen)
  • Thorsten Hellert (Berkeley Lab)
  • Tianwei Xing (UCLA)
  • Tim Klinger (IBM Research)
  • Victor Dibia (Microsoft Research)