The pilot that launched March 2, 2026 across 12 states has now been running long enough for a pattern to emerge in what state examiners ask for first. The answer, from multiple pilot-state engagements, is the same: produce a complete inventory of every AI and predictive model in production use across underwriting, pricing, claims, fraud detection, and customer service. That single request is diagnostic, because a company's ability to answer it in one meeting rather than across several weeks of internal discovery tells the examiner most of what it needs to know about the maturity of the governance program underneath.

The Big Data and Artificial Intelligence Working Group's June 1, 2026 public meeting brought a panel discussion on AI governance trends together with a pilot progress update, a pairing that reflects how quickly the tool has moved from working-group discussion to field-test artifact. Twelve states running Exhibit A requests since March have now collected enough model inventory data to know which questions need richer answers, and the Working Group's choice to put governance trends on the same agenda as pilot results is a signal: the tool is no longer a draft being refined in the abstract. It is generating documentation requests from domestic regulators using existing examination authority, before any new model law takes effect, before the tool is formally adopted.

Pennsylvania Insurance Commissioner Michael Humphreys, who chairs the Working Group, has framed the four-exhibit tool as a structured way for examiners to surface AI risk before deciding whether to open a deeper market conduct or financial examination. That framing is accurate but incomplete. What the structure also does is specify, for the first time in a regulatory context, which teams inside an insurer own which pieces of the AI evidence trail. For a surprising number of exhibits, that trail runs directly through the actuarial department.

When Governance Trends Become Examination Materials

The AI Systems Evaluation Tool is a four-exhibit sequence that regulators apply progressively, with each stage gate-keeping whether to escalate. Exhibit A quantifies AI footprint: how many models are in production, in which business functions, with what potential for consumer or financial impact. Exhibit B examines governance: the written AI program, third-party vendor oversight procedures, and how AI risk is incorporated into the Enterprise Risk Management framework and the Own Risk and Solvency Assessment. Exhibit C collects detailed disclosures on each model classified as high risk, including technical design, validation methodology, testing results, and human oversight levels. Exhibit D examines the data feeding AI systems: sources, training data composition, external licensing, and data quality controls.

The sequence matters because it determines who in the company gets drawn into the exam. A well-documented Exhibit A that shows a modest AI footprint concentrated in back-office functions can keep the examination from escalating to C and D at all. A fragmented inventory, or one that reveals broad consumer-facing AI use without corresponding governance documentation, invites the deeper exhibits and, potentially, a referral to a more intensive review of specific models and business functions.

Most pilot-state participants have been asked to complete Exhibit A first. That sounds straightforward, but in practice it requires an enterprise-level model inventory that few carriers have maintained at the level of granularity the exhibit implies. The request typically asks: how many AI and predictive models are in production, how many were added or retired in the past year, which ones make or materially influence decisions that affect policyholders, and whether any have generated consumer complaints. Answering confidently requires not just a list but governance documentation that ties each model to a responsible owner, a deployment approval, and a monitoring record.

In many companies, the closest analog to that documentation already exists in actuarial files. The actuarial opinion on a rating algorithm, the ASOP 56 documentation memo for a loss reserving model, the validation report for a predictive pricing model submitted to a state for rate approval. These files are not the same as an enterprise AI governance program, but they represent the most organized per-model documentation most insurance companies produce routinely. The June 1 panel discussion on governance trends placed this gap in public view: the Working Group heard perspectives on how the distance between enterprise AI governance expectations and actual company practice plays out differently depending on whether the company's AI capability grew through a traditional actuarial pricing workflow or through newer data science teams built outside the actuarial function. That difference matters because it determines who currently holds the evidence and who needs to produce it for an examiner.

Exhibit C and the Consumer Harm Scoring Problem

Exhibit C is where the regulatory theory of the Evaluation Tool meets the actuarial model portfolio. The exhibit collects detailed information for each model the company classifies as high risk, and the definition the tool applies is specific: a high-risk AI system is one that engages in automated decisioning and could cause adverse consumer, financial, or financial reporting impact.

That definition, read against the typical P&C or life carrier model inventory, points in a clear direction. Pricing models that set rates for personal auto, homeowners, or term life are automated decision systems with direct consumer impact. Underwriting models that generate risk scores used to decline or surcharge applicants carry both consumer and financial impact. Claims triage models that route claims to settlement tracks affect claim duration and payment timing. Fraud detection models can trigger coverage denial or claim suspension. Each of those categories almost certainly qualifies as high risk under the Exhibit C definition, and the proportionality principle the NAIC has built into the tool, allocating more regulatory scrutiny to higher-harm AI systems and less to back-office applications, means these models will receive the deepest examination attention.

For actuaries, this carries a specific implication: the models most likely to land in Exhibit C are models that actuaries either built, validated, or signed off on for regulatory filings. The personal auto pricing model submitted to California under Proposition 103, or to Texas under the file-and-use framework, carries actuarial documentation of its underlying variables, rating factors, and expected loss cost relationships. That documentation exists because state rate filing requirements created the demand for it decades before the NAIC's AI pilot existed. The pilot is now creating parallel demand from the examination side, and the two documentation streams may not match in format, scope, or language, even when they describe the same model.

The claims denial question adds another layer. The National Council of Insurance Legislators has developed a parallel model AI act, based on Florida legislation that stalled in 2025, which would require any AI-generated claim denial to be independently reviewed by a "qualified human professional" before issuance. That person must independently analyze the claim facts and policy terms, review the AI output for accuracy, and review prior human-made decisions on the same claim. In states where similar provisions have passed or are pending, carriers face a documentation obligation on top of the Evaluation Tool: not just that a human is technically in the loop, but that the human's review is recorded as a genuine independent analysis.

Exhibit D complicates the picture further. The data exhibit asks about training data composition, external data licensing, and the potential for proxy discrimination in AI inputs. For rate-setting models, this reaches into the variables that proxy for protected characteristics, including credit-based insurance scores, geographic rating factors, and external data enrichment from consumer data aggregators. The Working Group's particular concern about aerial imagery, social media data, and consumer database inputs reflects the same fairness priorities underlying Colorado's SB 21-169 algorithmic bias requirements. The difference is that Exhibit D converts those concerns into an examination inquiry, not just a regulatory principle.

Three ASOPs Against the Four Exhibits

Three actuarial standards of practice govern the model work that sits closest to what the NAIC's evaluation exhibits require. The alignment is close enough that carriers with mature actuarial documentation programs may find they have already produced much of what an examiner would want. The gap is that actuarial documentation is structured for individual models, produced in support of specific analyses, while the Evaluation Tool asks for enterprise inventory with consistent metadata across all functions.

ASOP No. 56, Modeling, sets out what an actuary must do when developing, selecting, modifying, using, or reviewing a model. The standard requires the actuary to assess the model's reasonableness and appropriateness for its intended purpose, evaluate input data quality, review the model's structure for known weaknesses, perform sufficient testing, and disclose material limitations in the actuarial document. Those requirements map closely onto what Exhibit C asks for in the high-risk system details section: how the model was developed, what data it uses, what testing was performed, what limitations have been identified, and what ongoing monitoring is in place. A well-documented ASOP 56 compliance memo for a pricing or reserving model is not identical to an Exhibit C response, but it covers the same factual ground. The chief actuary who can point an examiner to ASOP 56 documentation for each high-risk model is in a materially different position than the chief actuary who has to explain why those models lack structured documentation at all.

ASOP No. 23, Data Quality, governs how an actuary handles the data underlying actuarial work. The standard requires assessment of data quality and reasonableness, documentation of what data was used and where it came from, and disclosure of any known data defects that could affect the analysis. Exhibit D asks for exactly this information at the model level: the sources of training and operational data, controls to ensure data quality and representativeness, and how the potential for proxy discrimination in input variables is identified and managed. The data quality documentation an actuary produces under ASOP 23, typically embedded in actuarial opinions or memoranda supporting rate filings, covers source identification and reasonableness checks. It does not typically include a discriminatory impact analysis across protected class proxies, because that analysis has not historically been part of the actuarial rate-filing workflow. ASOP No. 12, which governs risk classification in property and casualty insurance, is currently under revision to incorporate fairness and bias guidance. Until that revision is finalized and adopted, the examination expectation for proxy discrimination screening in Exhibit D sits between what ASOP 23 requires and what state rate-filing rules mandate, in territory that actuaries are covering today with documentation habits the revised standard will eventually formalize.

ASOP No. 41, Actuarial Communications, governs the disclosure requirements in actuarial documents. It requires actuaries to identify the purpose and scope of actuarial work, disclose significant limitations, and ensure the intended audience can understand the results. In the Evaluation Tool context, ASOP 41 applies most directly to Exhibit B, the governance risk assessment exhibit, which can be completed in either narrative or checklist form. A carrier whose Exhibit B narrative describes the AI governance program with supporting references to ASOP 41-compliant actuarial opinions, carrying clear scope, limitation, and reliance statements, is demonstrating a documentation culture that regulators trained in examination will recognize as more mature than a program assembled in the weeks before an exam request arrived.

The critical gap is not in what the ASOPs require but in how the resulting documentation is organized. Actuarial documentation is produced model by model, in support of specific analyses, held across individual pricing and reserving actuaries and teams. An examiner asking for an enterprise AI inventory needs information consolidated across all business functions and models, with consistent metadata about each model's purpose, owner, deployment date, and supervisory cadence. Building that structure from underlying actuarial files is possible, but it requires a translation step most actuarial departments have not built into their standard workflows.

Third-Party Vendors and Where the Evidence Chain Breaks

Most insurer AI footprints extend well beyond internally developed models. Credit-based insurance scores, motor vehicle report enrichment, geospatial risk scores, telematics scoring algorithms, fraud detection networks, and claims automation tools are typically sourced from external vendors including LexisNexis Risk Solutions, Verisk Analytics, TransUnion, and Shift Technology. Those vendor models appear in the Evaluation Tool's scope whether or not the insurer built them.

Exhibit A asks for a count of third-party AI models alongside internally developed ones. Exhibit D asks for data lineage that includes external data sources and their licensing terms. Exhibit C, if a vendor model is classified as high risk, requests detailed validation and testing documentation. That last point creates an immediate practical problem: the vendor controls that documentation, and the insurer cannot simply produce it on demand.

This is the weakest link in most carriers' evidence chains. Internal models can be documented to ASOP 56 standards because the actuarial or data science team that built them can write the documentation. A vendor model comes with a term sheet and sometimes a technical overview, but typically not with the independent validation analysis, the testing-against-adverse-outcome data, or the monitoring procedure documentation that Exhibit C would require. The insurer's obligation under the NAIC Model Bulletin, adopted in December 2023 and now in effect in more than 25 states, covers third-party AI regardless of whether the vendor describes the product as a non-actuarial tool. The gap between that obligation and the documentation actually available from the vendor relationship is real and, in most cases, not resolved by current contract language.

The NAIC is developing a parallel instrument to address this directly. The Third-Party Data and Models Working Group is building a vendor registry framework that would create disclosure and registration requirements for AI vendors supplying models to insurers. The registry concept would allow examiners to query a centralized source for vendor-level documentation rather than forcing each insurer to independently obtain and maintain proprietary model documentation from every vendor relationship. That registry does not yet exist. But the direction it points is relevant to how carriers structure vendor contracts now, before the registry creates its own compliance layer on top of the evaluation tool's documentation requirements. The NAIC's Third-Party AI Vendor Registry development is running on a parallel track to the pilot and is expected to inform the November 2026 adoption discussions.

The practical answer today is contractual. Insurers entering or renewing vendor agreements for AI and predictive models should negotiate audit rights that give the insurer access to validation documentation, testing results, and monitoring data sufficient to respond to a regulatory inquiry. The standard vendor agreement for a credit scoring product or a geospatial risk model typically does not include those provisions. Adding them is easier in a new contract than renegotiating an existing one. Carriers that receive pilot examination requests in 2026 and discover that their vendor contracts do not support production of Exhibit C documentation are facing a gap that cannot be closed on the timeline an examiner imposes.

Building the Evidence File Before the Examiner Asks

The pilot runs through September 2026. The Working Group will revise the tool in October, expose it for public comment, and bring it to a possible adoption vote at the NAIC Fall National Meeting in November 2026. Insurers not currently in pilot states have roughly five months before the tool is likely to become a standard examination instrument with nationwide application. The documentation work that matters most is not novel. It is largely an extension of what ASOP compliance already produces, organized in a format that serves examination inquiries rather than individual actuarial analyses.

A model inventory is the foundation of Exhibit A compliance. The inventory should record every AI and predictive model in production, the business function it supports, the responsible owner, the vendor or internal team that developed it, the deployment approval date, and the current supervisory cadence. Actuarial models already tracked through pricing, reserving, and rate-filing workflows form the core of this inventory. Data science models built outside the actuarial function, often in marketing, claims, or fraud teams, typically have less formal documentation and require the most work to bring into an enterprise-level view with consistent metadata.

Governance documentation supporting Exhibit B should describe how the company's AI governance program integrates with ERM and ORSA, who holds responsibility for each model category, how third-party vendor oversight is conducted, and how the company documents and addresses adverse outcomes from AI systems. The actuarial function's role in model review and sign-off should be explicit, with clear language about which models require an actuarial opinion and which governance processes are triggered when actuarial models are materially updated or replaced.

Per-model documentation for Exhibit C is best structured around an ASOP 56-aligned template: intended purpose, development methodology, input data description, validation results, known limitations, human oversight level, and monitoring plan. Carrying a structured data sheet for each high-risk model does not require building a new documentation system from scratch. It requires organizing the ASOP 56 compliance memo so that its contents can be extracted from the actuarial file and presented to an examiner directly, without requiring the examiner to read through a full actuarial opinion to locate the relevant disclosures.

Data lineage records supporting Exhibit D require the most additional work for most actuarial departments. The ASOP 23 documentation an actuary produces records what data was used and that it was deemed of sufficient quality for the analysis. An Exhibit D response also needs to identify proxy discrimination risk in the variables used, trace the lineage of external data through licensing relationships, and document how data representativeness was assessed in training. The ASOP 12 revision, expected to incorporate bias and fairness guidance for risk classification models, will eventually create a standard framework for this analysis. Actuarial teams building data quality documentation for Exhibit D now are doing work that the revised ASOP will formalize, rather than work that the revised ASOP will contradict.

What the Pilot Results Will Determine

The pilot's September end date is the point at which feedback consolidates into a revised tool. What the 12 pilot states learn about insurer documentation practices, and specifically where the evidence chain breaks down, will directly shape what becomes standard nationwide. Several outcomes seem likely from the pattern of Exhibit A requests that have already been processed.

First, the proportionality principle will survive the revision intact. High-harm consumer-facing AI systems, the kind that actuaries design and validate for pricing, underwriting, and claims settlement, will remain the primary targets of examination scrutiny. The pilot states that have been running the tool since March have enough inventory data to confirm that the consumer harm hierarchy in Exhibit C's high-risk definition points examination resources in the right direction. There is no indication from the June 1 update that the Working Group is considering changes to that framing.

Second, the vendor documentation gap is likely to surface as the most common unresolved finding across pilot examinations. Insurers can control the documentation they produce internally. They cannot control the documentation their AI and data vendors produce or choose to withhold. The gap will inform both the final Evaluation Tool design and the parallel Third-Party Data and Models Law discussion. Whatever emerges from the Fall 2026 adoption process will almost certainly include stronger language about insurer obligations to secure documentation access from vendors before deploying vendor AI in consumer-facing functions.

Third, the ASOP framework will become more relevant to compliance discussions, not less, as the tool matures. The NAIC's Market Conduct Modernization Working Group formed at the Spring 2026 meeting will translate pilot findings into structural exam methodology recommendations by Fall 2026, and the methodological bridge between ASOP documentation and examination evidence is a natural target for that work. Actuaries already producing documentation under ASOP 56, 23, and 41 are generating the closest thing to examination-ready AI documentation that exists in most insurance companies. The gap is organizational, not substantive. Closing it requires connecting actuarial files to an enterprise model inventory, not rebuilding the documentation from scratch.

Carriers that begin that organizational work now, before the examiner asks, are in a fundamentally different position than those that scramble to assemble a model inventory under examination pressure. The evidence file being built informally today, by actuaries producing ASOP documentation, by governance teams assembling model registries, by legal teams negotiating vendor audit rights, is the same file an examiner will ask for in twelve months. Whether it is assembled in advance or under examination pressure is largely a question of whether the actuarial and governance functions are working from the same inventory now.

Further Reading

Sources

  1. NAIC Big Data and Artificial Intelligence (H) Working Group
  2. NAIC Insurance Topics: Artificial Intelligence
  3. NAIC Model Bulletin: Use of Artificial Intelligence Systems by Insurers (December 2023)
  4. Fenwick: NAIC Expands AI Systems Evaluation Tool Pilot Program to 12 States
  5. Foley & Lardner: What To Do If You Receive a NAIC AI Systems Evaluation Tool Pilot Request
  6. Monitaur: NAIC AI Systems Evaluation Tool Pilot, A Guide for Insurers
  7. Mayer Brown: US NAIC Spring 2026 National Meeting Highlights
  8. Actuarial Standards Board: ASOP No. 56, Modeling
  9. NAIC: Comments on AI Systems Evaluation Tool Exposure Draft
  10. Crowell & Moring: NAIC Intensifies AI Regulatory Focus
  11. American Academy of Actuaries: Actuarial and Algorithmic Accountability
  12. InsuranceNewsNet: NAIC's 2026 AI Evaluation Pilot Moves Ahead as Industry Balks