30-Hour AI Agents Push the Limits of Carrier Oversight

From reviewing patent filings and earnings transcripts across the top 20 carriers, we have tracked the progression of AI agent autonomy from minutes-long tasks in 2024 to the 30-hour sessions AIG disclosed this week. That number matters because it crosses a threshold that existing governance frameworks were not designed to accommodate. When an AI agent can operate autonomously for longer than an underwriter's working day, the "human in the loop" assumption that underpins NAIC model bulletins, EU AI Act Article 14, and every carrier's internal model risk management policy encounters a structural test.

CEO Peter Zaffino put the trajectory in context on the May 1, 2026 earnings call: "When we began our work with Claude 2.0, AI agents could operate autonomously for less than an hour. Today, they can run autonomously for as long as 30 hours." That is not an incremental improvement. It represents a qualitative shift in how carriers must think about oversight, audit trails, and the actuarial treatment of AI-generated decisions.

The Autonomy Leap: From Sub-One-Hour to 30-Hour Cycles

The path from one-hour autonomous operation to 30-hour cycles did not happen through a single model upgrade. It required three parallel developments in model capabilities, orchestration architecture, and agent handoff protocols that converged during the second half of 2025.

First, the underlying model capabilities expanded. Claude 2.0, the version AIG initially deployed, could process individual submissions and return structured outputs, but it required frequent human intervention to maintain context across tasks. The progression through Claude 3.5 and into the current generation brought extended context windows, improved instruction following over long sessions, and stronger reasoning chains that maintained coherence across thousands of sequential decisions.

Second, AIG built an orchestration layer through its partnership with Palantir. Zaffino described this architecture on the Q1 call: "In close partnership with Palantir and Anthropic, we've begun the next phase of agentic AI at AIG that builds on early successes of AIG Assist." The Palantir Foundry platform provides what AIG calls an "ontology," which Zaffino defined as "a digital map of our business that included our underwriting processes, workflows and data relationships." That ontology layer is what enables agents to operate for extended periods without drifting from business logic, because the guardrails are encoded in the data relationships rather than in human attention.

Third, the multi-agent architecture means individual agents operate within specialized domains rather than attempting general-purpose reasoning across the entire underwriting workflow. This is the architectural innovation that makes 30-hour autonomy practical, as the system decomposes complex decisions into manageable agent responsibilities.

Milestone	Approximate Timeframe	Autonomous Duration	Key Capability
Claude 2.0 initial deployment	Late 2023/Early 2024	<1 hour	Single-submission processing
AIG Assist pilot (nonprofit business products)	Q1 2025	1-3 hours	Submission ingestion and triage
Eight-line production deployment	Q4 2025	8-12 hours	End-to-end quote generation
Multi-agent orchestration	Q1 2026	Up to 30 hours	Multi-agent collaboration with synthesis

The jump from 8-12 hours to 30 hours coincides with AIG's deployment of what Coverager described as three agent categories beyond the core underwriting agents: knowledge assistants providing real-time information, adviser agents drawing on historical case insights, and critic agents that challenge recommendations before they surface to underwriters. That critic-agent layer is worth noting for actuaries because it introduces a form of automated peer review within the agent system itself.

How AIG Structures Multi-Agent Collaboration

AIG's Q1 2026 disclosure provided the most granular public description of a multi-agent insurance underwriting system that any carrier has released. Zaffino outlined four specialized agent roles:

Submission Ingestion Agent: Handles extraction and normalization of data from incoming submissions, processing the unstructured documents that characterize E&S market submissions. AIG processed over 370,000 submissions through this pipeline in 2025.
Risk Evaluation Agent: Assesses each submission against underwriting guidelines, flagging exposures, exclusions, and coverage gaps. This is where AIG's proprietary underwriting logic intersects with the LLM's reasoning capabilities.
Pricing Benchmarking Agent: Compares each submission against portfolio targets, historical pricing, and competitive positioning data.
Collaboration (Synthesis) Agent: Integrates outputs from the other agents, operating "at machine speed and with inherent consistency," as Zaffino described it, to produce a unified recommendation for the human underwriter.

The synthesis agent is the architectural element that enables extended autonomous operation. Rather than requiring a human to coordinate outputs from multiple analytical processes, the synthesis agent maintains state across all four workstreams and produces a coherent recommendation package. When Zaffino says agents operate for 30 hours, this likely refers to the synthesis agent coordinating continuous processing across a large submission batch, with the specialized agents handling individual tasks within that window.

The Palantir Foundry ontology provides the constraint framework. As Zaffino explained at the August 2025 Investor Day: "Our ontology will create a clear record of any actions taken, which will inform business logic and provide the ability to audit agents' activities." That statement is significant for governance because it positions the ontology as both a business logic layer and an audit mechanism. Every agent action is recorded against the ontology's data model, creating a deterministic log even when the LLM's reasoning is probabilistic.

The McGill and Partners deployment illustrates how this architecture extends beyond internal operations. AIG partnered with Palantir to build an ontology of McGill's portfolio, deploying 25% capacity across up to $1.6 billion in gross premiums written. The system generates near real-time insights on exposures, limit deployment, modeled risk outputs, and loss information. When an agent system manages capacity decisions across $1.6 billion in follow-market premiums, the governance question is no longer theoretical.

The "Human in the Loop" Question

Every major AI governance framework in insurance presumes some form of human oversight. The question that 30-hour autonomous operation raises is whether the current oversight models are structurally adequate, or whether they require fundamental redesign.

NAIC Governance Frameworks

The NAIC's Big Data and Artificial Intelligence (H) Working Group addressed agentic AI directly at its Spring 2026 meeting in San Diego (March 22-25, 2026). PwC presented a critical distinction to the working group: generative AI produces content, while "AI agents make decisions, manage workflows, and execute tasks." The working group identified four specific risk categories for agentic systems:

Accountability challenges when autonomous systems make decisions without direct human instruction
Cascading errors across agent chains, where one agent's output becomes another's input without human verification
Technological limitations in current LLM architectures, including hallucination risk during extended operation
Governance redesign needs, including human-in-the-loop escalation for high-risk scenarios

The NAIC also presented a four-tier risk taxonomy at the Spring meeting: unacceptable, high, medium, and low risk. Under this framework, autonomous underwriting decisions would likely fall into the high-risk category, requiring documentation of governance through reports covering executive summaries, data sources, risk assessments, model inventories, governance structures, model drift validation, bias testing, and consumer complaint processes.

As of March 2026, 24 states plus the District of Columbia have implemented the NAIC Model Bulletin on AI, originally adopted in December 2023. Eleven states (Colorado, Connecticut, Florida, Iowa, Louisiana, Maryland, Pennsylvania, Rhode Island, Vermont, Virginia, and Wisconsin) are participating in a pilot of the AI Systems Evaluation Tool, running from March through September 2026. That pilot tests the regulatory inspection framework across market conduct exams, financial exams, financial analyses, and general regulatory inquiries.

The fundamental gap: the Model Bulletin was drafted before 30-hour autonomous agents existed. Its oversight provisions assume a human reviews AI outputs before they affect policyholders. When agents operate overnight, process hundreds of submissions, and produce binding recommendations by morning, the review cadence implied by the bulletin may not match operational reality.

EU AI Act Article 14: Human Oversight Requirements

The EU AI Act provides more prescriptive requirements. Article 14 mandates that high-risk AI systems "can be effectively overseen by natural persons during the period in which they are in use." The implementing provisions specify that oversight measures "shall be commensurate with the risks, level of autonomy and context of use."

For insurance, the classification is explicit: AI systems used for risk assessment and pricing in life and health insurance are classified as high-risk under Annex III, with a compliance deadline of August 2, 2026. Deployers must assign trained personnel who possess both insurance expertise and AI literacy, can detect anomalies and question outputs, can override automated decisions, and can report suspected biases through documented escalation processes.

The "during the period in which they are in use" language in Article 14 creates a direct tension with 30-hour autonomous operation. If an agent system operates from 6 PM Friday through midnight Saturday, Article 14 arguably requires qualified oversight personnel available throughout that window. Carriers operating in EU markets, or EU-based reinsurers participating in AIG's programs, will need to address this question before the August 2026 enforcement date.

Framework	Oversight Requirement	30-Hour Agent Challenge
NAIC Model Bulletin (24 states + DC)	Human review of AI outputs before consumer impact	Batch processing may produce hundreds of decisions before review
EU AI Act Article 14	Effective human oversight "during the period" of use	Overnight/weekend operation requires continuous staffing
NAIC Four-Tier Risk Taxonomy	High-risk systems need comprehensive governance documentation	Agent chain complexity exceeds current documentation templates
Colorado AI Act (effective June 30, 2026)	Impact assessments for high-risk AI decisions	Assessment frequency unclear for continuous agent operation
IAIS Application Paper (July 2025)	Proportional governance based on existing ICPs	Principles-based approach lacks specific agent-duration guidance

Audit Trail Design for Extended Agent Cycles

If regulators and actuaries cannot watch agents in real time during 30-hour cycles, the audit trail becomes the primary governance mechanism. The quality of that trail determines whether extended autonomous operation is defensible in regulatory exams, rate filings, and reserve opinions.

Zaffino signaled AIG's approach at the August 2025 Investor Day: the Palantir ontology creates "a clear record of any actions taken" that informs business logic and provides audit capability. On the Q1 2026 call, he added that AIG "will be able to monitor each agent's activity and intervene in real time if needed." Those two statements describe a dual-layer audit architecture: a deterministic action log from the ontology, combined with real-time monitoring capability for human intervention.

For actuaries evaluating or building audit systems for agentic AI, three design requirements emerge from the current regulatory landscape:

Logging granularity: Every agent decision, data access, and output must be recorded with timestamps, input data references, and the agent's reasoning chain. The NAIC's evaluation tool pilot suggests regulators will expect to trace any individual underwriting decision back through the full agent chain, from submission ingestion through risk evaluation, pricing, and synthesis. In a 30-hour cycle processing hundreds of submissions, this produces a substantial data volume that itself requires governance around retention, access controls, and query capabilities.

Explainability at the decision level: The IAIS Application Paper on AI Supervision, published July 2, 2025, emphasizes transparency and explainability as one of its five key supervisory topic areas. For a multi-agent system, explainability requires not just the final recommendation but the contribution of each specialized agent. If the risk evaluation agent flagged an exposure that the pricing agent overweighted, the synthesis agent's reasoning for its final recommendation must be traceable. This is particularly important for E&S lines, where the absence of rate filing requirements does not eliminate the need for defensible pricing documentation.

Reproducibility and drift detection: An actuarial opinion on reserves or a rate filing that incorporates AI-generated underwriting decisions needs to demonstrate that those decisions were consistent with the carrier's stated underwriting appetite. Model drift, where agent behavior shifts over extended operation as context accumulates, represents a specific risk in 30-hour cycles. Agents that produce different risk assessments at hour 28 than they would at hour 2 for identical submissions present a fairness and consistency problem that regulators at the NAIC and under the EU AI Act would flag.

Model Risk Management Implications for Actuaries

The 30-hour autonomous agent introduces specific challenges for actuarial model risk management that go beyond traditional model validation frameworks.

The Banking Comparison

Banking regulators provide a useful reference point. The OCC, Fed, and FDIC released revised Model Risk Management guidance on April 17, 2026 (OCC Bulletin 2026-13), replacing the SR 11-7 framework that had governed bank model risk since 2011. Notably, the revised guidance explicitly excludes generative and agentic AI, characterizing these systems as "novel and rapidly evolving" and outside the current scope. The agencies plan a separate Request for Information on AI use by banks "in the near future."

That exclusion is instructive for insurance. If banking regulators with a 15-year model risk management tradition chose to defer on agentic AI rather than force it into existing frameworks, insurance regulators working from a two-year-old Model Bulletin face an even wider gap. The banking guidance also narrowed its model definition to require "complexity," excluding simple arithmetic and deterministic rule-based processes. Agentic AI systems clearly meet the complexity threshold, but the multi-agent architecture complicates the question of what constitutes a single "model" for validation purposes.

The Healthcare Comparison

Healthcare offers a different model. The FDA's framework for AI-enabled medical devices uses three autonomy tiers: Tier 1 (informing/suggesting), where AI surfaces recommendations and clinicians decide; Tier 2 (driving/diagnosing), where AI produces diagnoses with limited override; and Tier 3 (treating/closing the loop), where AI autonomously actuates therapeutic decisions. Over 1,250 AI-enabled medical devices had received FDA authorization by July 2025.

The FDA's Predetermined Change Control Plans, finalized in December 2024, allow manufacturers to pre-approve AI algorithm updates without filing new marketing submissions, provided they include bias mitigation, reference standards, post-market monitoring, and rollback plans. That framework accepts continuous model evolution in exchange for structured change management. Insurance regulators have not yet developed an equivalent mechanism for underwriting AI that learns and adapts during operation.

AIG's multi-agent system maps roughly to FDA Tier 2: the agents produce structured recommendations that underwriters can override. But the 30-hour operating window and the volume of decisions processed within that window push toward Tier 3 in practice, because meaningful human review of every recommendation becomes impractical at scale.

Reserve and Pricing Implications

For reserving actuaries, the question is how to treat underwriting decisions influenced by AI agents in loss reserve estimates. If an agent system approved submissions that a human underwriter would have declined, or priced risks differently than the carrier's filed rate plan contemplated, the resulting book of business may have different loss characteristics than the historical data used in reserve models. ASOP No. 43 (Property/Casualty Unpaid Claim Estimates) requires the actuary to consider "the nature of the exposure base" and "significant changes in conditions or trends." A shift from human to agentic underwriting qualifies as a significant change in conditions.

For pricing actuaries, the concern is whether AI agent decisions during 30-hour cycles are consistent with filed rating plans and underwriting guidelines. Even in E&S markets where prior approval is not required, actuaries certifying rate adequacy need to verify that the agents' pricing recommendations align with the carrier's intended rate level. The AIG Assist metrics (30% more quotes, 40% more binds) suggest the agents are capturing business that human underwriters were not reaching. Whether that incremental business carries the same expected loss ratio as the existing book is an empirical question that pricing actuaries should model explicitly.

For regulatory filings, ASOP No. 56 (Modeling) applies to any actuarial analysis that relies on AI model outputs. An actuary who incorporates agent-generated underwriting data into a rate filing or reserve analysis is responsible for understanding the model's limitations, testing its outputs, and disclosing its use. The 30-hour autonomous operation window adds a dimension to that disclosure: the actuary should understand not just how the model works in general, but how its behavior may vary across extended operating cycles.

What Adequate Governance Looks Like

The gap between AIG's operational reality and existing governance frameworks does not mean 30-hour autonomous operation is inherently ungovernable. It means the governance model needs to shift from continuous human observation to structured checkpoint and exception-based oversight. Based on the frameworks that banking and healthcare regulators have developed, and the specific risks the NAIC identified at its Spring 2026 meeting, an adequate governance model for extended-duration agentic AI in insurance would include:

Defined escalation thresholds: Agents should flag decisions that exceed predetermined risk parameters for human review, regardless of where they fall in a 30-hour cycle. AIG indicated this capability exists: the ability to "monitor each agent's activity and intervene in real time if needed." The governance question is whether the thresholds are calibrated to catch the right decisions.
Periodic automated consistency checks: At defined intervals during extended operation, the system should compare agent outputs against baseline performance metrics. Drift beyond acceptable bounds should trigger automatic pause and human review. This addresses the NAIC's concern about cascading errors across agent chains.
Post-cycle review protocols: After each 30-hour cycle, a structured review of agent decisions should cover distribution of accepted versus declined submissions, pricing deviation from guidelines, flagged exceptions and their resolution, and any anomalies in agent reasoning chains.
Actuarial validation of book composition: Pricing actuaries should monitor whether the mix of business produced during extended agent cycles differs materially from human-underwritten business. Loss ratio emergence by agent cycle versus human cycle provides the critical metric.
Regulatory documentation that addresses duration: Rate filings, reserve analyses, and model governance documentation should explicitly address the duration of autonomous operation and the controls in place during that window. Regulators participating in the NAIC evaluation pilot will likely ask this question; carriers that have prepared documentation will have a smoother exam experience.

Why This Matters for the Profession

The 30-hour autonomous agent is not a theoretical construct. AIG has disclosed it in a public earnings call, attached to a quarter where it delivered $774 million in underwriting income at an 87.3% combined ratio. Competitors are watching. Travelers has deployed 10,000 Anthropic AI assistants. Chubb disclosed nine AI projects and is pursuing a 20% workforce reduction target. The trajectory across the industry points toward longer autonomous operating windows, more complex multi-agent architectures, and a progressively larger share of underwriting decisions flowing through AI systems.

For actuaries, the professional obligation is clear. ASOP No. 56 already requires actuaries to understand models used in their analyses. When those models operate autonomously for 30 hours and make thousands of decisions between human checkpoints, understanding the model requires understanding the governance framework around it. Actuaries who validate, price, or reserve for AI-underwritten business without examining the oversight architecture are not meeting the standard of practice that regulators, and the profession's own standards, require.

The NAIC has flagged the governance gap. The EU AI Act's August 2026 deadline approaches. Banking regulators have acknowledged they are not ready to regulate agentic AI and plan a separate rulemaking. In that regulatory vacuum, the carriers and actuaries who build adequate governance now will set the standard that regulators eventually codify.