NAIC Flags Agentic AI as Insurance's Next Governance Gap

From tracking the NAIC’s regulatory arc on AI, from principles (2020) to bulletin (2023) to evaluation tools (2025–2026), the pivot to agentic AI marks a genuine inflection in how regulators conceptualize autonomous systems in insurance. At the Spring 2026 National Meeting in San Diego on March 24, the NAIC’s Big Data and Artificial Intelligence (H) Working Group devoted a full panel to agentic AI risks in insurance. The discussion surfaced three categories of concern: cascading errors across autonomous decision chains, accountability gaps when no single human oversees the full workflow, and performance limitations that traditional ML governance frameworks do not address.

This matters because every governance document the NAIC has produced to date, from the 2020 AI Principles to the December 2023 Model Bulletin to the four-exhibit evaluation tool now in a 12-state pilot, was designed for a world where AI meant a predictive model with defined inputs, a training dataset, and a single output. Agentic AI breaks that model. When an LLM agent chains together multiple reasoning steps, calls external tools, and takes actions with downstream consequences, the traditional framework of “validate the model, document the data, test for bias” becomes necessary but insufficient.

No actuarial outlet has yet analyzed what the NAIC’s agentic AI focus means for the governance frameworks insurers are already building, the evaluation tool pilot that 12 states are running right now, or the practical validation challenges facing actuaries whose sign-off authority will soon extend to autonomous systems they may not fully understand.

What the NAIC Said at the Spring 2026 Panel

The Big Data and AI Working Group’s March 24 session had two agenda items: operationalizing the 2023 Model Bulletin, and a panel discussion on emerging AI governance trends. It was this second item, featuring a presentation by PwC, that broke new ground.

PwC discussed the evolving risk landscape of agentic AI systems and provided an overview of both potential value and potential risk. The firm highlighted the importance of AI governance while also stressing the need to develop frameworks that expedite responsible AI use rather than create unnecessary impediments. That framing is notable: regulators invited a Big Four firm to present not just on risks, but on how governance should avoid becoming a bottleneck. The NAIC has historically leaned toward caution, so the explicit acknowledgment that overly restrictive frameworks could slow innovation signals awareness that the insurance industry is already deploying these systems in production.

The panel identified three specific risk categories for agentic AI that go beyond what the 2023 Model Bulletin addresses:

Cascading errors across multiple agents. When a traditional predictive model fails, the failure is contained: a single output (a risk score, a rate relativity, a claim estimate) is wrong, and the error can be traced to a specific data input or model parameter. When an agentic system fails, the error propagates. Agent A makes a recommendation, Agent B acts on it, Agent C adjusts downstream decisions based on B’s output. By the time a human reviews the result, the original error is buried in a chain of apparently reasonable intermediate steps. This is fundamentally different from model risk as actuaries have traditionally understood it.

Accountability gaps. The 2023 Model Bulletin requires insurers to designate “a person or persons who are responsible for the AI system.” That requirement was drafted with a specific mental model: one model, one owner, one accountability chain. Agentic systems challenge this because they operate across organizational boundaries. An underwriting agent might draw on a claims model, a pricing algorithm, and external data sources in a single decision workflow. Which designated person owns that composite output? The panel noted that this question does not have a clear answer under existing guidance.

Performance limitations that governance frameworks miss. Traditional model validation tests a model against holdout data, measures discrimination metrics, and checks for stability over time. But agentic systems can exhibit failure modes that these tests will not catch: an agent that performs well on routine tasks but fails unpredictably when encountering edge cases outside its training distribution, or an agent that optimizes for a proxy metric rather than the actual business objective. The panel emphasized that existing governance frameworks need redesigning, not just extending, to cover these scenarios.

The Mitigation Strategies on the Table

Beyond identifying risks, the panel outlined four mitigation strategies that regulators and carriers should be developing in parallel:

Monitoring the use of agents. This goes beyond traditional model monitoring. For a GLM or gradient-boosted model, monitoring means tracking prediction distributions, feature drift, and out-of-time performance. For an agentic system, monitoring must also capture the decision path: what tools the agent called, what intermediate reasoning it produced, and whether it escalated appropriately when encountering uncertainty. AIG’s orchestration layer, built on Palantir Foundry, represents one approach to this: it coordinates multiple AI agents across the enterprise while maintaining centralized visibility into agent actions. But most carriers deploying agentic tools today lack this infrastructure.

Establishing clear accountability. The panel recommended that carriers define accountability not just for the agentic system as a whole, but for each link in the decision chain. This means mapping which business function owns each agent, which data governance rules apply to each data source the agent accesses, and which human has escalation authority when the chain produces an unexpected output. For actuaries, this has direct implications: if an agentic underwriting system produces a rate indication, the signing actuary needs to understand (and be able to explain to a regulator) every step in the chain that produced it.

Redesigning governance frameworks for agentic AI. The most significant recommendation was that existing governance frameworks, including the structures carriers built to comply with the 2023 Model Bulletin, need fundamental redesign to account for multi-agent workflows. PwC emphasized that governance should “expedite responsible AI use rather than create unnecessary impediments,” a framing that suggests the redesign should make governance more dynamic and workflow-integrated rather than adding another layer of static documentation requirements. In practice, this likely means moving from periodic model reviews to continuous monitoring, from individual model validation to end-to-end workflow testing, and from static risk assessments to real-time risk scoring of agentic decision chains.

Human-in-the-loop escalation for high-risk cases. The panel was explicit that agentic AI should not operate fully autonomously in high-risk insurance decisions. This aligns with PwC’s broader position that current AI agent implementations should maintain “human-in-the-loop controls for higher-risk decisions” in underwriting and claims handling. The practical challenge is defining what constitutes “high risk” in an agentic context, where the system itself may not recognize when it has entered territory that warrants escalation.

How Agentic AI Differs from the Models the 2023 Bulletin Covers

To understand why the NAIC’s pivot matters, it helps to map the specific ways agentic AI differs from the traditional ML systems the December 2023 Model Bulletin was designed to govern.

The Model Bulletin, adopted by 24 states and the District of Columbia as of the Spring 2026 meeting, requires insurers to implement written AI governance programs that cover risk management, documentation, third-party oversight, and consumer protection. Four states have adopted additional insurance-specific AI regulations on top of the bulletin. The bulletin’s framework is built around four core assumptions:

Defined inputs and outputs. The bulletin assumes a model takes in data and produces a prediction or classification that a human can evaluate. Agentic systems take actions, not just predictions.
Identifiable training data. The bulletin requires documentation of the data used to develop and validate AI systems. An LLM agent may draw on its pre-training corpus, retrieval-augmented generation sources, real-time API calls, and prior conversation context in a single decision. Documenting “the data” for that system is a qualitatively different exercise.
Static decision boundaries. The bulletin’s approach to bias testing and fairness evaluation assumes the model will behave consistently given the same inputs. Agentic systems can produce different outputs for identical inputs depending on context, prior interactions, and the specific agent orchestration path taken.
Single-model ownership. As noted above, the bulletin’s accountability requirements assume a clear line between one AI system and one responsible person or team. Multi-agent workflows dissolve that line.

None of this means the Model Bulletin is irrelevant. Its principles around consumer protection, fairness, and governance program structure remain foundational. But the operational requirements, specifically around documentation, testing, and accountability, need significant extension to cover systems that reason, act, and chain decisions autonomously.

The NAIC’s Regulatory Risk Taxonomy

Alongside the agentic AI discussion, NAIC staff presented a four-level risk taxonomy for categorizing AI systems in insurance:

Risk Level	Description	Examples in Insurance
Unacceptable	Subliminal manipulation, social scoring	Systems that exploit behavioral biases to increase premium acceptance
High	Potential for significant harm if failure occurs	Automated claims denial, underwriting triage with coverage impact
Medium	Requires transparency; chatbots, emotion recognition	Customer service AI, sentiment analysis in claims calls
Low	Minimal restrictions	Spam filters, internal document search, scheduling tools

This taxonomy mirrors the EU AI Act’s tiered approach, and its emergence at the NAIC level suggests regulators are moving toward risk-proportionate oversight rather than uniform requirements across all AI applications. For carriers deploying agentic systems, the question becomes: where do multi-agent workflows land in this taxonomy? An agentic underwriting system that chains together data extraction, risk scoring, and pricing recommendation almost certainly qualifies as high-risk. But what about an agentic system that handles routine customer inquiries and occasionally escalates to claims intake? The boundaries are not yet defined.

The 12-State Evaluation Tool Pilot and Its Agentic Gap

Running concurrently with the Spring Meeting discussion, the NAIC’s AI Systems Evaluation Tool pilot is testing regulators’ ability to assess carrier AI governance in practice. The pilot launched March 2, 2026, and runs through September 2026, with 12 participating states: California, Colorado, Connecticut, Florida, Iowa, Louisiana, Maryland, Pennsylvania, Rhode Island, Vermont, Virginia, and Wisconsin.

The evaluation tool uses four exhibits to structure the assessment:

Exhibit A quantifies AI system usage: how many models are in production, what consumer complaints have been received, and what future AI plans exist.
Exhibit B assesses governance and risk management frameworks, offered in both narrative and checklist formats.
Exhibit C gathers detailed information on high-risk models with automated decision-making capability.
Exhibit D documents data sources, AI model provenance, and vendor relationships.

The tool was designed before the NAIC formally identified agentic AI as a distinct risk category. Exhibit C, which focuses on “high-risk models with automated decision-making,” comes closest to capturing agentic scenarios, but its framing still assumes the traditional model structure: one model, one decision, one set of inputs to document. An agentic workflow that chains three or four models together with tool-calling and branching logic does not fit neatly into a single Exhibit C response.

This matters for the pilot timeline. The NAIC plans to update the evaluation tool based on pilot feedback in September through October 2026 and re-expose it for public review, with formal adoption expected at the Fall 2026 National Meeting. If the agentic AI discussion from the Spring Meeting is going to influence the final tool, the window for that feedback is narrow. Carriers participating in the pilot who are also deploying agentic systems should be documenting where the current tool’s structure falls short, because those gaps will shape what regulators ask for in the permanent version.

Industry response to the pilot has been mixed. Panelists at the Spring Meeting highlighted scope definition difficulties and the resource disparity between large and small insurers. A carrier with a dedicated AI governance team can assemble comprehensive Exhibit B and C responses. A regional mutual deploying a single vendor-provided AI tool may lack the staff to complete the same documentation. The NAIC’s stated goal is a size-agnostic, risk-based approach, but the operational burden of that approach falls unevenly, and agentic systems will widen that gap further. For a deeper look at how the evaluation tool pilot is playing out, see our analysis of the 12-state pilot and industry pushback.

Carrier Agentic Deployments Already in Production

The NAIC’s Spring 2026 discussion did not occur in a vacuum. Several major carriers have already deployed agentic AI systems that would fall within the risk categories the panel identified.

AIG: orchestration layer with multiple LLM agents. AIG’s partnership with Palantir has produced what CEO Peter Zaffino described as “companions that operate with our teams,” providing real-time insights and challenging underwriting decisions. Through Lloyd’s Syndicate 2479, launched January 1, 2026, AIG deployed LLM agents to analyze a delegated authority portfolio managing $300 million in premium. The agents use Palantir’s Foundry platform to build ontologies that map entities, risks, and relationships, allowing AI systems to reason about them. AIG’s Lexington Insurance unit processed over 370,000 E&S submissions in 2025, with a target of 500,000 by 2030, and Zaffino described “a massive change in our ability to process submission flow without additional human capital resources.” For 2026, AIG’s most forward-leaning initiative is developing the orchestration layer to coordinate multiple AI agents across the enterprise at scale.

Travelers: agentic claims assistant and 10,000 AI-equipped staff. In January 2026, Travelers announced its Anthropic partnership to deploy personalized AI assistants to nearly 10,000 engineers, data scientists, analysts, and product owners. In February 2026, the carrier launched an AI Claim Assistant developed with OpenAI, a fully agentic intelligent voice system that handles customer claim calls for auto damage filing. OpenAI’s Head of Platform Product called it “one of the most sophisticated agentic voice implementations capable of consulting, advising and supporting customers through the full complexity of a claim conversation.” The system can consult policies, guide customers through decisions, and escalate to live agents, but it is making consequential decisions in real time about how to characterize damage, what policy provisions to cite, and when to escalate.

Where these deployments intersect with NAIC risk categories. Both of these live deployments raise the exact concerns the NAIC panel identified:

Cascading errors: AIG’s multi-agent orchestration layer, by design, chains decisions across agents. If the ontology mapping mischaracterizes a risk relationship, that error propagates through every downstream agent action.
Accountability: Travelers’ AI Claim Assistant operates at the customer interface. When it characterizes policy coverage or guides a filing decision, who is accountable: the claims department, the technology team, the vendor (OpenAI), or the compliance function?
Performance limitations: Both systems are operating in production with human escalation paths. But neither carrier has disclosed how those escalation triggers are calibrated, what false-negative rates they accept for escalation failures, or how they test for edge cases outside the training distribution.

The Third-Party Vendor Layer

The agentic AI governance challenge is compounded by the third-party vendor question. At the same Spring Meeting, the NAIC advanced its proposal for a registry of vendors that supply AI models and datasets to insurers. The Third-Party Data and Models Working Group narrowed its initial scope to focus on pricing and underwriting functions, with outstanding questions around whether registration would be mandatory or voluntary.

For agentic systems, the vendor question becomes especially complex. AIG’s agentic stack involves Palantir (platform), multiple LLM providers (models), and proprietary ontology construction (AIG’s own IP). Travelers’ deployment involves Anthropic (internal assistants), OpenAI (claims assistant), and internal engineering (TravAI platform). In both cases, the agentic behavior emerges from the interaction between carrier systems and vendor components. A registry that catalogs which vendors provide which models is a useful starting point. But it does not capture the agentic dimension: how those components are orchestrated, what autonomy each agent has, and where human oversight sits in the chain.

The NAIC was clear that the proposed registry “is not intended to relieve insurers of their existing vendor diligence and management obligations.” Carriers remain responsible for the outputs of systems built on vendor components. For actuaries performing model validation on agentic workflows, this means the scope of validation must extend beyond the individual vendor model to the full orchestration architecture, including how agents interact, what data flows between them, and what happens when one agent in the chain fails or produces an unexpected output.

The Broader State-Level Regulatory Landscape

The NAIC’s agentic AI discussion sits within a rapidly evolving state regulatory landscape. In March 2026, the NAIC published an Issue Brief on Artificial Intelligence and State Insurance Regulation that formally articulated its position: AI in insurance should be governed through state-based oversight, not federal preemption, consistent with the McCarran-Ferguson framework that has structured insurance regulation since 1945.

Several state-level initiatives are running in parallel:

Colorado SB 21-169 requires insurers to demonstrate that AI systems do not produce unfair discrimination. Auto and health insurers face July 1, 2026, compliance deadlines for annual reporting on algorithmic fairness. Life insurers have been in scope since 2023 under Regulation 10-1-1. Colorado’s framework is insurance-specific and enforced by the Division of Insurance, separate from the broader Colorado AI Act (SB 24-205) enforced by the Attorney General.
The NCOIL model act proposed by the National Council of Insurance Legislators would require insurers to have “qualified human professionals” make final claims decisions, maintain action records, and disclose AI use to consumers. This directly constrains fully autonomous agentic claims processing.
Individual state adoptions of the Model Bulletin continue to expand. The 24 states plus D.C. that have adopted the bulletin represent the broadest consensus position, but four states have gone further with additional insurance-specific AI regulations.

None of these frameworks specifically addresses agentic AI. Colorado’s SB 21-169 focuses on algorithmic discrimination testing, which is a necessary but insufficient check for multi-agent systems. The NCOIL model act’s requirement for human final decisions is relevant but does not address the intermediate steps in an agentic chain where errors can compound before reaching the human decision point. The Spring 2026 panel represents the first time the NAIC has identified this gap explicitly, but translating that recognition into regulatory guidance will take additional meeting cycles.

What Actuaries Validating Agentic Systems Need to Consider

For actuaries whose responsibilities include model validation, appointed actuary opinions, or compliance attestation, the NAIC’s agentic AI focus raises questions that go beyond traditional model risk management frameworks.

Scope of validation. ASOP No. 56 (Modeling) requires actuaries to understand the model, its intended purpose, and its limitations. For a GLM or a gradient-boosted tree, this is achievable through documentation review, parameter inspection, and holdout testing. For an agentic system, “understanding the model” means understanding the full orchestration: which agents interact, what decision paths are possible, how context and prior actions influence subsequent agent behavior, and where the system can fail in ways that are not visible from examining any individual component.

Documentation challenges. The Model Bulletin requires documentation of data sources, model methodology, and governance procedures. For agentic systems, the documentation burden is an order of magnitude larger. Each agent in the chain may have its own data sources, its own decision logic (some of which may be opaque, as with LLM reasoning), and its own failure modes. The interactions between agents create emergent behaviors that cannot be fully predicted from the individual agent documentation alone. Actuaries need to push for documentation standards that capture the system architecture, not just the individual model cards.

Testing for fairness in multi-agent workflows. Bias testing for a single model is well-understood: run the model on test populations segmented by protected class, measure disparate impact, and document the results. For an agentic workflow, bias can emerge at any step in the chain, and the cumulative effect of small biases across multiple agents can be larger than testing any individual agent would reveal. End-to-end fairness testing across the full workflow is necessary, but no standardized methodology exists for it yet.

The escalation calibration problem. Both AIG and Travelers have described human-in-the-loop escalation as a key safeguard. But the effectiveness of escalation depends entirely on calibration: how sensitive are the triggers, and what is the false-negative rate? If an agentic system escalates too rarely, harmful decisions pass through unchecked. If it escalates too frequently, the system loses its efficiency value and becomes a traditional workflow with AI suggestions. Actuaries validating these systems need access to escalation trigger logic and performance data on escalation accuracy.

Professional liability exposure. If an actuary signs off on a rate filing or reserve opinion that relies on outputs from an agentic system, and that system subsequently produces unfairly discriminatory outcomes or materially inaccurate results, the actuary’s professional exposure is not reduced by the fact that the error originated in an AI agent rather than a traditional model. ASOP No. 56’s requirements around understanding models and their limitations apply regardless of the model architecture. Actuaries who cannot demonstrate that they understood the agentic system’s decision chain are vulnerable to both regulatory and professional discipline proceedings.

The Timeline from Recognition to Regulation

History suggests the path from NAIC issue identification to enforceable guidance is measured in years, not months. The NAIC adopted its AI Principles in August 2020. The Model Bulletin took until December 2023, over three years later. The evaluation tool was developed through 2024 and 2025 and entered its pilot phase in March 2026. At that pace, specific agentic AI guidance would arrive no earlier than 2028.

But the pace of carrier agentic deployment is moving faster than prior AI adoption cycles. AIG launched its LLM agents at Lloyd’s on January 1, 2026. Travelers deployed its agentic claims assistant in February 2026. Multiple vendor platforms (Palantir, OpenAI, Anthropic) are actively marketing agentic capabilities to carriers. By 2028, the question will not be whether carriers have deployed agentic systems, but how deeply embedded those systems have become in production workflows.

This creates a window, likely 18 to 24 months, where carrier agentic deployments will outpace regulatory guidance. Actuaries and compliance teams operating in this window face a familiar dilemma: wait for explicit guidance and risk falling behind competitors, or move forward with best-effort governance and risk a future regulatory finding that the governance was insufficient.

The NAIC’s four-level risk taxonomy and the evaluation tool pilot provide partial scaffolding. Carriers can classify their agentic deployments under the high-risk tier and apply enhanced governance accordingly. They can use the evaluation tool’s Exhibit C framework as a starting point for agentic documentation, even though it was not designed for that purpose. And they can implement the panel’s four mitigation strategies (monitoring, accountability, governance redesign, human escalation) as an interim framework pending formal guidance.

But the core challenge remains: the governance infrastructure the industry has spent two years building in response to the 2023 Model Bulletin is not designed for the systems carriers are now deploying. The NAIC recognizes this. The Spring 2026 panel was the first step toward addressing it. What comes next will determine whether the regulatory framework catches up before the gap between governance and deployment becomes a source of consumer harm.

Why This Matters for the Profession

The NAIC’s distinction between traditional AI and agentic AI is not an academic exercise. It reflects a material change in how autonomous systems operate within insurance value chains, and it carries direct implications for practicing actuaries in multiple roles:

Pricing actuaries whose rate filings depend on AI-generated risk scores need to know whether those scores came from a single model or a multi-agent workflow, because the validation requirements are fundamentally different.
Reserving actuaries relying on AI-assisted claims estimates need to understand the decision chain that produced those estimates, especially when agentic claims systems are making triage and severity judgments that feed directly into IBNR calculations.
Chief actuaries signing appointed actuary opinions need to assess whether their companies’ AI governance programs, built to comply with the Model Bulletin, are adequate for the agentic systems now entering production.
Consulting actuaries advising carriers on AI governance need to update their frameworks to address multi-agent workflows, escalation calibration, and the vendor orchestration layer that most agentic deployments involve.

The NAIC has identified the governance gap. The question now is whether the profession moves to close it before regulators mandate specific requirements, or whether actuaries find themselves retrofitting governance onto systems that have already been running in production for years.

NAIC Flags Agentic AI as Insurance’s Next Governance Gap