NAIC AI Evaluation Pilot Launches Amid Industry Pushback

The NAIC's Big Data and Artificial Intelligence (H) Working Group started running its AI Systems Evaluation Tool through a live pilot on March 1, 2026. Twelve state insurance departments, including California, Florida, Pennsylvania, and Wisconsin, are now requesting governance documentation, model inventories, and high-risk system details from domestic insurers under a framework that did not exist 12 months ago. The joint trade-group letter that landed on regulators' desks in early December called the pilot "one-sided, voluntary for regulators while compulsory for companies," and that tension is now playing out across actual exam rooms. From reviewing all 33 comment letters submitted in response to the NAIC's May 2025 Request for Information on a possible AI Model Law, the split between carriers favoring principles-based guidance and consumer groups demanding prescriptive rules is wider than the public commentary suggests, and the pilot is the bridge regulators are using to walk between them.

What the AI Systems Evaluation Tool Actually Asks For

The Evaluation Tool is not a single questionnaire. It is a four-exhibit framework that lets a state regulator dial up or down the depth of inquiry based on what an Exhibit A response reveals. Pennsylvania Insurance Commissioner Michael Humphreys, who chairs the Working Group, has framed the tool as a standardized way to surface inherent risk before a full market conduct or financial examination is opened. The four exhibits are designed to be used in sequence, though regulators retain discretion to skip ahead or limit scope.

Exhibit A: Quantification of AI System Use. The opening exhibit is essentially an AI inventory. Companies are asked how many models are in production, how many are new versus updated versus retired, which ones have direct consumer or material financial impact, and whether any have triggered consumer complaints. The questions appear simple, but for a multi-line carrier with hundreds of models spread across pricing, underwriting, claims triage, fraud detection, and marketing, producing a defensible count requires a model inventory most companies have not maintained at this level of granularity. Exhibit A is the screening exhibit: based on what regulators see, they decide whether to escalate to one or more of the deeper exhibits.

Exhibit B: Governance Risk Assessment Framework. Exhibit B asks for the AI governance program itself, including roles and responsibilities, third-party vendor oversight, transparency disclosures, monitoring procedures, and how the company integrates AI considerations into its Enterprise Risk Management and ORSA processes. The Working Group offers two formats, narrative and checklist, and individual states can choose either. The Committee of Annuity Insurers, in its comment letter, asked the NAIC to eliminate the narrative form precisely because it creates a risk of overlapping, slightly different questions across jurisdictions.

Exhibit C: High-Risk Model Details. This is where the sharpest industry pushback has landed. Exhibit C requests detailed information on each model the company classifies as "high-risk," including how it was developed, what testing was performed, the level of human-in-the-loop involvement, and how the model is reviewed for compliance. The criteria for "high risk" are set by the insurance company itself in the current draft, which creates a defensible workaround but also leaves room for regulators to challenge the boundary. AHIP and several life industry trade groups have asked the NAIC to anchor a common high-risk definition rather than letting each state interpret it differently.

Exhibit D: AI Systems Model Data Details. The fourth exhibit focuses on the data feeding AI systems: external versus internal sources, third-party data licensing, training data composition, and the lineage of inputs into models that influence consumer outcomes. Exhibit D is the one most likely to be invoked by market conduct examiners responding to a consumer complaint, because it traces the upstream causes of a downstream decision.

Information collected through any of the four exhibits will be protected under the confidentiality rules of the administering state, and the Working Group has emphasized that participating regulators will "leverage their exam authority" rather than create new disclosure requirements. That distinction matters because it means the legal foundation for the pilot is the existing financial and market conduct exam framework, not a freestanding regulatory regime that would require legislative action.

The 12-State Pilot: Who, When, and How

When the Working Group first announced the pilot in late 2025, it referenced ten participating insurers selected by a smaller group of states. By February 2026, after California joined, the pilot had grown to 12 states: California, Colorado, Connecticut, Florida, Iowa, Louisiana, Maryland, Pennsylvania, Rhode Island, Vermont, Virginia, and Wisconsin. The pilot runs from March 2026 through September 2026. Each state will use the tool with its domestic insurers, focusing first on companies with the most significant AI footprints, and pilot states will meet monthly to share lessons and reconcile inconsistencies in how the exhibits are being applied.

The September end date is not the end of the work. The Working Group has signaled that pilot feedback will be used to revise the tool in September and October 2026, with the updated version re-exposed for public comment ahead of a possible adoption vote at the NAIC Fall National Meeting in November 2026. From tracking NAIC working group timelines over the past several years, that schedule is aggressive but consistent with how the Model Bulletin moved from concept to adoption in roughly 14 months in 2022 and 2023.

Some insurers in pilot states have already received participation requests. Foley & Lardner attorneys, advising clients in February 2026, characterized the request as something insurers should treat "like an early-stage exam inquiry," and they explicitly warned that participation could be effectively mandatory at the discretion of the state. That framing matters. A request from a domestic regulator under existing exam authority is not something a carrier can decline without creating its own regulatory risk, even if the underlying tool is described as voluntary at the state level.

The Joint Industry Letter and What It Actually Argues

On December 5, 2025, six trade associations representing life, health, property and casualty, mutual, and reinsurance insurers filed a joint letter raising five specific objections to the pilot. The letter is worth reading carefully because it distills concerns that go beyond the usual regulatory process complaints.

The first objection is structural: the pilot is voluntary for regulators and compulsory for companies. A state can choose whether to participate, and within a participating state a regulator can choose whether to send a request to a particular insurer. Once a request is sent, however, the company has no realistic option to decline. The associations argue this is an asymmetry that no prior NAIC pilot has imposed.

The second objection is that the pilot lacks a defined duration. While regulators have publicly stated the pilot runs March through September 2026, the trade groups note that nothing in the tool itself binds states to that timeline, and the door is open to extending information requests well beyond the nominal end date. For a CAE or chief actuary trying to plan model documentation work, an open-ended regulatory inquiry creates a budgeting problem that rolls forward indefinitely.

The third objection concerns the dual-purpose nature of the tool. The same exhibits can be deployed as part of a market conduct exam, which historically focuses on consumer treatment, or as part of a financial examination, which focuses on solvency and reserving. Combining these two examination tracks under one questionnaire blurs lines that have historically protected insurers from having a single piece of information used in two different enforcement contexts. The trade groups see this as a path to compounded findings.

The fourth objection is more pointed: the trade groups state that companies can apparently be penalized for any "negative" findings based on data gathered via the tool during the pilot phase, before the tool is finalized. Under the standard NAIC pilot process, results are typically used only to refine the instrument, not to support enforcement actions. The Working Group has not committed in writing to that protection, which is why the trade groups raised it explicitly.

The fifth objection is that the pilot may begin before the final version of the tool is exposed for comment. The November 19, 2025 draft is the version being used in the pilot, but the comment period on that draft was still open when pilot states began outreach. The trade groups argued the NAIC should follow the process used in previous pilots, where exposure and finalization preceded any field testing.

Iowa Insurance Commissioner Doug Ommen, who has been a visible spokesperson for the Working Group, responded by emphasizing that state insurance commissioners' authority to coordinate AI supervision is "necessary, effective and consistent with federal law." That last phrase is doing significant work in light of recent events at the federal level.

From Bulletin to Tool to Possible Model Law

The current debate makes more sense when placed against the longer arc of NAIC AI work. The 2020 NAIC Principles on Artificial Intelligence were the first formal output, followed in December 2023 by the Model Bulletin on the Use of Algorithms, Predictive Models, and Artificial Intelligence Systems by Insurers. The bulletin is principle-based and relies on existing laws, particularly the Unfair Trade Practices Act and unfair claims settlement statutes, for enforcement. As of late 2025, 24 states had adopted the bulletin or pursued legislation, and an additional four states had adopted related activity, with NAIC's own page reporting 25 states by March 2026.

The May 15, 2025 RFI on a potential AI Model Law was the next step. The Working Group exposed the RFI for a 45-day comment period that ended June 30 and received 33 written comment letters from state departments of insurance, consumer representatives, health provider groups, trade organizations, insurtechs, an advisory organization, and consultants. The split in those responses is the single most important data point for understanding where the Working Group is heading.

Reading the responses, the divide is not simply industry against consumers. The American Academy of Actuaries, in its comment letter, supported a principles-based approach and explicitly recommended that any model language "be as consistent as possible across states and jurisdictions, while allowing individual modifications by each state." The Academy also endorsed the idea of deploying the Evaluation Tool on a pilot basis in 2026 with the intention of evaluating its use and making refinements before any broader adoption. That position threads the needle: support for the tool, support for consistency, and skepticism about prescriptive statutory mandates.

The American Council of Life Insurers and the American Property Casualty Insurance Association argued against pursuing a model law at all, recommending instead that the NAIC focus on continued adoption of the Model Bulletin and on developing additional, uniform, and flexible guidelines. Consumer groups and several state regulators, particularly from Vermont and Colorado, pushed in the opposite direction, urging the NAIC to develop standardized consumer disclosures and to address gaps the bulletin does not cover. North Dakota's chief examiner asked for a gap analysis specifically focused on what a model law could provide that is not already in the bulletin.

Pennsylvania's Commissioner Humphreys, summarizing the comments at a September 29, 2025 interim meeting, noted that the Working Group had not reached a foregone conclusion and that the purpose of the RFI was to determine whether existing laws and regulations are sufficient. The current operational answer is to wait for pilot feedback before deciding. The Evaluation Tool is, in effect, a fact-finding exercise that will inform whether a model law is needed and what it should cover.

NCOIL's Parallel Track

While the NAIC has been proceeding methodically, the National Council of Insurance Legislators (NCOIL) opened a second front. At its July 17, 2025 meeting, the NCOIL Financial Services and Multi-Lines Issues Committee introduced a draft "NCOIL Model Act Regarding Insurers' Use of Artificial Intelligence," sponsored by New York Assemblyman Erik Dilan and Oklahoma Representative Forrest Bennett. The draft is based on Florida Senate Bill 794 and Florida House Bill 1555, neither of which passed in 2025.

The NCOIL model is narrower in scope than what a future NAIC model law might be, but it is more prescriptive within that scope. It requires that decisions to deny insurance claims be made by a "qualified human professional," defined as an individual with statutory authority to address or deny a claim. Before issuing a denial, the qualified human must independently analyze the claim facts and policy terms, review the AI-generated outputs for accuracy, and review previous human-made decisions on the claim. In practice, the model would prohibit AI from being the sole basis for a claim denial across all lines of insurance.

Industry opposition has been strong. ACLI, NAMIC, APCIA, and AHIP submitted a joint letter to NCOIL arguing that a one-size-fits-all approach is inappropriate, that existing technology-neutral laws are sufficient, and that NCOIL should first assess whether there are specific risks of adverse outcomes not already covered by current law. The American Medical Association supported the model, citing patient protection concerns, particularly around health insurance prior authorization and AI-driven coverage determinations.

The practical risk for actuaries and compliance teams is the prospect of two competing frameworks moving on parallel tracks. The NCOIL model focuses narrowly on claims handling and human-in-the-loop requirements. A future NAIC model law would likely cover the full AI lifecycle and apply across lines of business. If both move to adoption, individual states would face the question of which framework to enact, and a multi-state carrier could end up needing to comply with overlapping requirements that were never designed to fit together.

The 24-State Bulletin Patchwork

The 24-state Model Bulletin adoption count masks meaningful variation. New York implemented its own framework through Department of Financial Services Insurance Circular Letter No. 7, released in July 2024, which requires insurers operating in the state to establish a risk management framework specifically for AI and external consumer data sources. Colorado's Artificial Intelligence Act, passed in May 2024, layered governance and testing procedures on top of the state's existing insurance regulation. California's Fair Employment and Housing Act AI rules took effect October 1, 2025, with reach into how insurers use algorithmic decision systems. Texas enacted the Texas Responsible Artificial Intelligence Governance Act (TRAIGA) in June 2025, with the Texas attorney general holding general enforcement authority but insurance entities continuing to be regulated by the state Department of Insurance.

What this means in practice is that even if a carrier has documented compliance with the NAIC Model Bulletin in 24 jurisdictions, it may still face state-specific requirements in New York, Colorado, California, and Texas that go beyond the bulletin in either scope or specificity. The Evaluation Tool pilot adds another layer for the 12 participating states, since pilot questions can extend beyond what the bulletin requires on its face. From watching how state regulators have implemented the bulletin so far, the practical compliance burden has scaled with the size of an insurer's geographic footprint, not with the complexity of its AI use.

The Federal Preemption Shadow

Layered on top of all of this is the question of federal preemption. President Donald Trump signed an executive order in early December 2025 establishing a federal AI regulation framework intended to create a single national standard and undermine the patchwork of state rules. The executive order is broader than insurance, but its application to insurance has been the subject of immediate debate. Commissioner Ommen, addressing the topic at a Working Group meeting, argued that any attempt to prevent state insurance commissioners from coordinating AI supervision "would impact consumers negatively and represent a significant departure from our state regulatory system that's worked for over 150 years."

The McCarran-Ferguson Act has long shielded the business of insurance from federal preemption in most areas, and any serious attempt to override state insurance authority on AI would face significant statutory and constitutional challenges. But the executive order has created enough uncertainty that some carriers are quietly reconsidering whether to invest in detailed compliance with the NAIC tool when a federal standard could reset the playing field. Regulators, for their part, appear determined to proceed as if the executive order does not change their authority. The pilot is moving forward on schedule.

What Actuarial Teams Need to Do Now

Whether or not a carrier is in a pilot state, the AI Systems Evaluation Tool sets a benchmark for what regulators expect AI governance documentation to look like. From reviewing the four exhibits and the comment letters submitted by industry, several practical compliance gaps are showing up consistently across companies.

The first gap is the model inventory itself. Exhibit A's questions assume the company can produce a current count of AI models by type, by use case, by consumer impact, and by financial materiality. Many carriers have governance frameworks that document individual high-stakes models well but lack a consolidated, queryable inventory of all production models. Building that inventory is the first step toward being able to respond to an Exhibit A request without scrambling.

The second gap is the high-risk classification methodology. Exhibit C lets the company set its own definition of "high risk," which is a meaningful concession by the Working Group, but only if the company has actually documented its methodology and applied it consistently. A defensible high-risk classification requires written criteria, an inventory of which models meet those criteria, and a record of when each model was last reviewed against them. Actuaries running pricing and reserving models should be involved in this exercise even if their models would not be classified as high risk under most definitions, because the boundaries of regulatory interest are still being drawn.

The third gap is third-party model documentation. Exhibit B specifically asks about the use and oversight of AI system vendors, and Exhibit D asks about the lineage of external data feeding internal models. Many carriers have contracts with vendors that do not include the model documentation rights the NAIC is now expecting carriers to be able to surface. Renegotiating those contracts at scale is a multi-year effort, but the gap closes faster if it starts now. Reading the existing AI Governance Gap analysis we published earlier this year alongside the Evaluation Tool exhibits shows where the most common contractual gaps appear.

The fourth gap is the audit trail for model validation. Exhibit C asks how high-risk models were tested and how they are reviewed for compliance. The Model Bulletin already expects insurers to test for bias and discrimination, but the NAIC's own May 2025 health insurance survey found that nearly one-third of health insurers do not regularly test their models for bias. Whether or not a carrier is in scope for the pilot, that statistic is the kind of thing future model laws will be drafted to address. Building a documented validation cadence now is cheaper than retrofitting one under exam pressure later.

The final consideration is staffing. Responding to an Evaluation Tool request will require actuarial, IT, legal, and compliance time across multiple weeks, and the work cannot be fully delegated to a vendor because much of what regulators are asking for is internal documentation that only the company can produce. Patterns we have seen in early pilot responses suggest carriers that designated a single coordinator for the response handled the process more cleanly than those that distributed it across functions.

Where This Is Headed

The most likely path forward, based on the comments the Working Group has received and the way Commissioner Humphreys has been framing the open questions, is that the Evaluation Tool gets refined through the pilot, adopted in some form at the November 2026 Fall National Meeting, and then becomes a standard reference for regulators conducting market conduct and financial exams in states that adopt it. Whether a model law follows is a separate question, and the Working Group has indicated it will use pilot feedback to decide. NCOIL's narrower model act will continue moving on its own track and may be adopted in some states regardless of what the NAIC does.

What is already clear is that the era of AI being treated as an unregulated implementation detail inside insurance operations is over. The Evaluation Tool, even in its pilot form, formalizes the expectation that AI governance documentation will be examined the same way reserving methodology and rate filings have always been examined. For actuarial teams that have been working on AI governance for the past two years, the pilot validates that work. For teams that have not, the next 18 months are the closing window to catch up before catching up becomes a regulatory matter rather than a strategic one.