How Carriers Train Agentic AI Without Exposing Policyholder Data

From examining the data provenance disclosures in six recent state rate filings that cited ML-based models, a consistent gap emerges: carriers describe model architecture in detail but remain vague about how training data was sourced, anonymized, and validated for representativeness. One filing referenced "historical claims data from 2018-2023" without specifying anonymization procedures, consent frameworks, or whether the data had been tested for demographic representativeness. Another disclosed "proprietary policyholder interaction data" as a training source with no mention of privacy-preserving techniques applied during model development.

This documentation gap is about to become a regulatory problem. The NAIC's Third-Party Data and Models Working Group, meeting March 23, 2026 in San Diego, finalized a vendor registration framework that requires every registered AI vendor to disclose training data sources and date ranges, documented testing methodology including bias testing, known limitations, and change-management practices. Carriers remain accountable for models they deploy, regardless of vendor registration status. The registry creates transparency but does not create a safe harbor.

The timing creates urgency. Twenty-two percent of insurers plan to have an agentic AI solution in production by year-end 2026, according to Celent's third annual GenAI survey. Agentic AI systems, which execute multi-step workflows autonomously, require richer training data than traditional ML models. They need realistic claims conversations, underwriting submission sequences, billing dispute patterns, and policyholder interaction transcripts to learn effective autonomous behavior. The data they need is precisely the data that privacy laws most aggressively protect.

The Three-Way Tension: Data Hunger, Privacy Law, and Regulatory Disclosure

Agentic AI creates a training data problem that simpler ML models do not face. A traditional pricing model trains on structured tabular data: loss amounts, exposure counts, policy characteristics. An agentic claims handler trains on conversation sequences, document chains, decision trees, and multi-party interactions. The training corpus must include realistic policyholder language, claim-specific context, coverage disputes, and resolution patterns. These are personal data by any statutory definition.

The constraint tightens from three directions simultaneously.

State privacy laws. As of March 2026, twenty U.S. states have comprehensive privacy laws in effect. California's AB 2013, the Generative AI Training Data Transparency Act effective January 1, 2026, requires publicly accessible generative AI systems trained on personal information to publish training dataset disclosures including sources, data types, intellectual property status, and personal information content. More than thirty-five states have active AI legislation in various stages of advancement. Indiana, Kentucky, and Rhode Island privacy laws took effect in 2026 alone.

The NAIC vendor framework. The registration regime covers pricing, underwriting, claims, utilization reviews, marketing, and fraud detection with direct consumer impact. For each covered function, the framework requires disclosure of training data sources and date ranges, documented testing methodology including bias testing where applicable, known limitations, supported and unsupported use cases, and a contact for regulator inquiries. First state implementations are expected late 2026 or early 2027, with NAIC adoption anticipated at the Fall National Meeting in November 2026.

The 12-state AI evaluation pilot. Running March through September 2026 across California, Colorado, Connecticut, Florida, Iowa, Louisiana, Maryland, Pennsylvania, Rhode Island, Vermont, Virginia, and Wisconsin, the pilot's Exhibit D specifically examines data sources, quality controls, representativeness, and potential for proxy discrimination. Each high-risk system requires documentation of model design, training data, validation procedures, performance metrics, and bias testing results.

The combined effect: carriers must train AI systems on realistic policyholder data, cannot freely use that data without privacy compliance, and must now document exactly what data they used and how they protected it. The documentation requirement transforms what was previously a best-practice aspiration into a regulatory obligation with examination consequences.

Why 77% of Carriers Cannot Meet the Disclosure Standard Today

Industry surveys quantify the gap between what regulators will require and what carriers can currently produce. Seventy-eight percent of organizations cannot validate data before it enters AI training pipelines. Seventy-seven percent cannot trace where their training data originated. Fifty-three percent have no mechanism to recover or remove training data after an incident. These figures span industries but align with patterns observed in insurance-specific governance assessments.

The provenance problem compounds at each layer of the training pipeline. A carrier building an agentic claims handler might assemble training data from five sources: historical claims adjuster notes (internal), call transcripts (internal, potentially recorded under varying consent frameworks by state), vendor-provided synthetic scenarios (third-party), publicly available regulatory filings (public), and policyholder correspondence (internal, subject to state-specific retention and use restrictions). Documenting the lineage, consent basis, anonymization method, and representativeness testing for each source, at the granularity the NAIC framework expects, requires infrastructure that most carriers have not built.

From tracking carrier AI governance programs over the past eighteen months, the pattern is consistent. Data science teams assemble training datasets pragmatically, optimizing for model performance. They pull from whatever internal data stores are accessible. Privacy review, when it occurs, happens after the dataset is assembled rather than as a gate at collection. Lineage metadata is not captured at ingestion because the tooling does not exist in most carrier data environments. When a regulator or auditor later asks "what data trained this model and how was it anonymized," the honest answer for most production systems is "we would need to reconstruct that information, and we may not be able to do so completely."

Grant Thornton's 2026 AI Impact Survey reinforces this: only 24% of insurance executives believe their organization could pass an independent AI governance review within 90 days. Training data provenance is among the most difficult artifacts to produce retroactively because the absence of lineage metadata at the point of data collection cannot be filled in later with certainty.

Synthetic Data: Manufacturing Compliant Training Corpora

Synthetic data generation offers the most direct solution to the training data privacy problem. Rather than anonymizing real policyholder data (which carries residual re-identification risk), carriers generate entirely artificial datasets that preserve the statistical properties and distributional characteristics of real data without containing any actual policyholder information.

Gartner projects that by 2026, 75% of businesses will use generative AI to create synthetic customer data, up from less than 5% in 2023. For insurance specifically, Gartner forecasts that by 2027, 40% of AI algorithms utilized by insurers will use synthetic data to guarantee fairness and comply with regulations.

Techniques in Production

Four primary approaches dominate insurance synthetic data generation:

Conditional Tabular GANs (CTGANs). Neural networks that learn the joint distribution of structured insurance data and generate new samples preserving correlations between fields. MAPFRE deployed CTGANs through the Synthetic Data Vault platform to generate synthetic property fraud claims data. The motivation was specific: property fraud is both costly and rare, creating insufficient training examples for detection models. By augmenting real data with synthetic samples preserving the statistical distribution of fraudulent claims, MAPFRE increased fraud detection recall by 31% with estimated savings of $310,000 per 100 fraudulent claims identified.

Variational Autoencoders (VAEs). Compress real policyholder interaction data into a latent representation, then reconstruct synthetic instances from that compressed space. The reconstruction process introduces sufficient variation that individual records cannot be traced back to source policyholders while preserving the aggregate statistical structure actuaries need for model validation.

Transformer and diffusion models. The most recent generation of synthetic data techniques, capable of generating highly realistic sequential data including conversation flows, document chains, and multi-step interaction patterns. These approaches are particularly relevant for agentic AI training because they can produce realistic multi-turn claims conversations, underwriting submission sequences, and policy service interactions without sourcing from actual policyholder transcripts.

Statistical simulation with actuarial constraints. Classical Monte Carlo and copula-based approaches that generate synthetic portfolios preserving known actuarial distributions (frequency, severity, development patterns) without machine learning. These methods produce training data for pricing and reserving models where the distributional properties are well-characterized from aggregate industry data.

The Representativeness Challenge

Synthetic data solves the privacy problem but introduces a representativeness problem that the NAIC framework specifically addresses. Exhibit D of the AI evaluation tool examines data representativeness and potential for proxy discrimination. A synthetic dataset that faithfully reproduces the statistical properties of a carrier's historical book will also reproduce any historical biases embedded in that book.

If a carrier's historical claims data underrepresents certain geographic areas or demographic groups, synthetic data generated from that history will perpetuate the gap. The NAIC's bias testing requirement applies regardless of whether training data is real or synthetic. Carriers cannot treat synthetic data as automatically bias-free simply because it contains no real individuals.

From examining rate filing documentation, we have seen carriers assert that synthetic data eliminates privacy risk without addressing whether the synthetic generation process was tested for demographic representativeness. This gap will attract regulatory scrutiny as the evaluation tool pilot matures. The defensible position requires documenting both that synthetic data contains no real policyholder information and that the generation process was validated against representativeness benchmarks.

Vendor Landscape

Several vendors now offer insurance-specific synthetic data platforms. Hazy focuses on differential privacy guarantees at the enterprise level, positioning for carriers where compliance and governance documentation are primary requirements. Syntho emphasizes de-identification, removing all PII and replacing it with artificial identifiers while preserving statistical relationships. DEDOMENA combines synthetic data generation with data enrichment. DataCebo provides the open-source Synthetic Data Vault framework including the CTGAN architecture MAPFRE validated.

The vendor choice carries regulatory implications. Under the NAIC framework, vendors providing synthetic data generation tools that feed into consumer-impacting AI systems may themselves need to register and disclose their methodology. Carriers should evaluate whether their synthetic data vendor's approach produces documentation sufficient for both the carrier's own governance requirements and the vendor's potential registration obligations.

Federated Learning: Training Without Centralizing Data

Federated learning takes a fundamentally different approach. Rather than moving data to the model, it moves the model to the data. Each participating entity trains a local copy of the model on its own policyholder data. Only encrypted, anonymized model parameter updates are shared with a central coordinator. No raw policyholder data leaves the carrier's environment.

The Society of Actuaries published research on federated learning for insurance companies in 2024, conducted with Colorado State University. The study demonstrated through a loss modeling case study that federated learning can leverage data from multiple entities to improve prediction accuracy, particularly for rare events or new markets where single-carrier data is insufficient. The SOA research specifically noted that heterogeneity of data across carriers, including diverse formats, quality levels, and data types, makes harmonization nontrivial but not insurmountable.

The Hong Kong Proof of Concept

The Hong Kong Applied Science and Technology Research Institute (ASTRI), working with the Insurance Authority, published a whitepaper in November 2025 documenting a functional federated learning platform for insurance deployment. The proof of concept demonstrated two use cases: claims forecasting using medical and clinical data across participants, and renewal prediction using financial and credit datasets. Insurance members actively participated in working group discussions, experiments, and implementation. The platform demonstrated that carriers could improve predictive accuracy on shared problems without any participant accessing another's raw policyholder data.

Cross-Jurisdictional Training

Federated learning's strongest regulatory advantage emerges in cross-border contexts. An underwriting algorithm can learn from claims patterns across Singapore, the EU, and the U.S. without raw data crossing jurisdictional boundaries. This matters for global carriers operating under simultaneously applicable privacy regimes. GDPR restricts transfers of European policyholder data to U.S. servers. CCPA restricts how California consumer data can be processed. Singapore's PDPA imposes its own constraints. Federated learning is compliant by design across all three frameworks because the data never moves.

Manuel Rodriguez Vera of Capgemini's WNS unit, writing in Carrier Management in April 2026, recommended federated learning as one of the practical approaches for insurers to "embed control, privacy and compliance into AI systems while enabling collaboration across jurisdictions." The approach converts what would otherwise be a data-sharing negotiation (with legal review, data processing agreements, and transfer impact assessments) into a model-parameter exchange that carries no personal data.

Limitations for Agentic AI Training

Federated learning works well for structured prediction tasks: loss modeling, pricing, fraud scoring. It works less naturally for training agentic AI systems that need to learn conversational behavior, document processing sequences, or multi-step decision workflows. These training tasks typically require the model to process full interaction sequences in context, which federated architectures can accommodate but with additional complexity in maintaining sequence coherence across distributed training rounds.

The practical implication: federated learning is a strong solution for the structured components of agentic AI training (the scoring models, the risk assessment modules, the routing classifiers) but a weaker fit for the conversational and workflow components that define agent behavior. Most carriers implementing agentic AI will need federated learning for some training tasks and synthetic data for others.

Differential Privacy: Quantifying the Privacy Budget

Differential privacy adds calibrated statistical noise to data or model outputs, providing mathematically provable guarantees about how much information any individual record can contribute to the trained model. The key concept is the privacy budget (epsilon): a quantitative measure of maximum privacy loss any individual in the training set can experience.

In insurance applications, differential privacy operates at multiple layers. At the data layer, carriers add random variations to claims frequency counts, severity distributions, or policyholder characteristics before feeding data into model training. The noise is calibrated so that aggregate patterns (which actuaries need for pricing) are preserved while individual-level reconstruction becomes statistically impossible. At the model layer, differential privacy constrains how much any single training example can influence model parameters during gradient descent, preventing the model from memorizing specific policyholder records.

The regulatory appeal is the quantitative privacy guarantee. When a regulator asks "how did you protect policyholder data during model training," a carrier using differential privacy can provide a specific epsilon value and explain what it means: with epsilon of 1.0, the probability of any output changes by at most a factor of e (approximately 2.72) whether any single policyholder's data was included or excluded from training. This is a concrete, auditable claim that satisfies documentation requirements in ways that qualitative assertions ("we anonymized the data") cannot.

The Privacy-Utility Tradeoff

Stronger privacy guarantees (lower epsilon) require more noise, which degrades model performance. For insurance applications where actuarial precision matters, carriers must calibrate the privacy budget against acceptable prediction error. A claims triage model that routes 95% of claims correctly at epsilon 3.0 might drop to 88% at epsilon 1.0. Whether that degradation is acceptable depends on the downstream workflow: if the agent escalates uncertain cases to human adjusters, lower accuracy at the triage layer may be tolerable.

From patterns in recent regulatory filings, the emerging consensus for insurance applications clusters around epsilon values of 2.0 to 5.0 for training data and 1.0 to 3.0 for model outputs exposed to queries. These ranges reflect a practical compromise: strong enough to satisfy privacy regulators, permissive enough to maintain actuarial accuracy within acceptable bounds for pricing and reserving applications.

The Competitive Moat: Historical Data as Strategic Asset

The NAIC's disclosure requirements create an asymmetric competitive dynamic. Carriers with large, well-labeled historical claims databases can train agentic AI systems using privacy-preserving techniques applied to rich internal data. Carriers starting from limited or poorly organized internal data face a structural disadvantage that synthetic data generation alone cannot fully bridge.

Progressive illustrates the advantage. With over 14 billion miles of driving data collected and analyzed through its Snapshot telematics program, Progressive has assembled a training corpus for behavioral AI that no competitor can replicate without equivalent time investment. When Progressive applies differential privacy to this dataset for model training, the resulting models benefit from a volume and diversity of observations that produce accurate predictions even after privacy noise is applied. A carrier with one-tenth the data volume applying the same privacy budget would produce materially less accurate models.

The Hartford demonstrates the governance dimension. Having appointed Jeffery Hawkins as Chief Data, AI, and Operations Officer and elevated its technology leadership structure, Hartford established specific AI governance policies requiring that external parties develop written policies verifying AI use is consistent with ethical obligations, establish mechanisms to identify and address potential AI biases, and ensure that nonpublic information never enters open-source AI systems. These governance requirements, published in Hartford's external-facing policies, effectively constrain how training data flows through their vendor ecosystem.

The Allstate Cautionary Tale

Allstate's subsidiary Arity demonstrates what happens when training data collection lacks adequate privacy governance. The Texas Attorney General's January 2025 lawsuit alleged that Arity collected "trillions of miles worth of location data from over 45 million consumers" by embedding tracking software in third-party apps including Life360, Routely, and Fuel Rewards. The complaint alleges Arity specifically sought apps with existing location features "to avoid alerting consumers" to the data collection.

The data was used both to underwrite Allstate policies and monetized through sale to other carriers, effectively building the "world's largest driving behavior database" without informed consent. A federal class action is proceeding alongside the state enforcement action. The case illustrates the regulatory and litigation risk when carriers (or their subsidiaries) assemble training datasets without privacy-first architecture. Under the NAIC's vendor registration framework, a vendor like Arity would need to disclose these data sources and collection methods, making the opacity that enabled the alleged conduct impossible to maintain.

Building a Compliance-Ready Data Provenance Pipeline

For carriers deploying agentic AI systems in the current regulatory environment, "good enough" for compliance means documenting five elements for every dataset that feeds model training:

1. Source identification and consent basis. Where did the data originate? Under what legal basis was it collected? For policyholder interaction data, was the data collected under the insurance contract terms, under a separate consent framework, or under a legitimate interest basis? The answer varies by state and data type. Call recordings require different consent documentation than policy application data.

2. Anonymization or privacy-preserving technique applied. The NAIC framework requires knowing what happened to data between collection and model training. Carriers should document the specific technique (synthetic generation, federated processing, differential privacy with stated epsilon, or traditional anonymization with k-anonymity threshold) and maintain validation records showing the technique achieved its stated privacy guarantee.

3. Representativeness testing and bias assessment. Exhibit D of the AI evaluation tool screens for demographic representativeness. Carriers should document testing that training data represents the population the model will serve, not just the population the carrier historically insured. This distinction matters: historical data reflects historical underwriting decisions, which may have systematically excluded certain populations.

4. Date ranges and currency. The NAIC framework specifically requires training data date ranges. This addresses model staleness: a claims handler trained on 2019-2021 interactions may not reflect current policyholder expectations, regulatory requirements, or coverage structures that changed subsequently. Documenting date ranges also enables regulators to assess whether the training period included anomalous events (pandemic years, catastrophe years) that might skew model behavior.

5. Change management and retraining protocols. The framework requires documented change-management practices. For agentic AI systems that learn continuously or undergo periodic retraining, carriers must document how new data enters the training pipeline, what privacy gates it passes through, and how model behavior changes are validated before production deployment.

Implementation Architecture

The practical architecture for a compliance-ready training pipeline combines multiple privacy-preserving techniques rather than relying on any single approach:

Training Data Type	Primary Technique	Documentation Artifact
Claims conversations and transcripts	Synthetic generation (transformer-based)	Generation methodology report, distributional validation, bias testing results
Underwriting submission patterns	Differential privacy (epsilon 2.0-4.0)	Privacy budget allocation, utility-privacy tradeoff analysis
Cross-carrier loss modeling data	Federated learning	Federation protocol, parameter aggregation method, no-data-movement attestation
Policyholder demographic features	Synthetic generation with representativeness constraints	Demographic parity testing, four-fifths rule validation
Historical pricing and rating data	Statistical simulation (copula-based)	Distributional fit tests, tail behavior validation

This layered approach allows carriers to document a clear privacy-preserving technique for each data component, satisfying the NAIC's disclosure requirements without relying on blanket assertions about anonymization that may not withstand scrutiny.

What "Good Enough" Looks Like for Late 2026 Compliance

The NAIC chose a registration regime rather than licensure, meaning the initial compliance bar is transparency rather than prescriptive technical standards. Carriers and vendors must disclose what they did, not necessarily prove it was optimal. This distinction shapes the practical compliance threshold for the first implementation wave.

For the 12-state pilot concluding in September 2026, carriers participating in the evaluation will need to complete Exhibit D for each high-risk AI system. The exhibit examines data sources, quality controls, representativeness, and proxy discrimination potential. Carriers that can produce coherent documentation across these dimensions, even if imperfect, will be in a materially stronger position than those presenting documentation gaps.

The minimum viable compliance posture for a carrier deploying agentic AI by late 2026 includes: a training data inventory identifying all sources by type and date range; documentation of at least one privacy-preserving technique applied to each source containing personal information; bias testing results across the protected characteristics the NAIC specifically screens for (race, ethnicity, and proxies including ZIP code, credit variables, and aerial imagery correlates); and a change management procedure describing how the training pipeline handles new data additions and model retraining events.

Gartner's warning adds urgency to implementation timelines: over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear business value, or inadequate risk controls. Carriers that treat data provenance documentation as a post-deployment retrofit rather than a design-time requirement are disproportionately likely to fall into the cancellation cohort. Building privacy-preserving architecture into the training pipeline from inception costs a fraction of retrofitting it after a regulatory examination identifies gaps.

Actuarial Implications

For pricing actuaries, the shift to privacy-preserving training data introduces quantifiable prediction uncertainty. Differential privacy noise, synthetic data approximation error, and federated learning convergence gaps all contribute to model prediction intervals wider than those achieved with unrestricted real data. Actuaries signing rate filings that rely on AI-assisted models should understand and quantify this additional uncertainty in their actuarial memoranda, consistent with ASOP No. 56 requirements for documenting model limitations.

For reserving actuaries, the emergence of agentic claims handlers trained on synthetic data raises questions about loss development pattern stability. If agents trained on synthetic conversations handle claims differently than human adjusters trained on real interactions, historical development patterns may not fully apply to the AI-handled claim cohort. Early segregation of AI-handled claims in development triangles would allow actuaries to monitor for divergent development patterns before they materially impact reserve adequacy.

For model validation actuaries, the NAIC framework creates new validation requirements specifically around training data. Model validation under ASOP No. 56 has traditionally focused on model outputs (prediction accuracy, stability, reasonableness). The disclosure framework extends validation upstream to training data provenance, representativeness, and privacy compliance. Validation reports for AI systems should now address whether training data documentation is sufficient for regulatory disclosure, not just whether model predictions are statistically sound.