NAIC AI Model Bulletin Gets a Compliance Report Form

NAIC staff at the Spring 2026 National Meeting circulated a draft compliance report that converts the December 2023 AI Model Bulletin into nine specific disclosure components carriers must complete and attest to. The bulletin has been adopted by 24 states and the District of Columbia (NAIC State Map) but has never carried a standardized template for proving compliance.

Four more states have issued parallel AI guidance (Quarles), yet none of them has published a structure for documenting that a carrier's AI program satisfies the bulletin. That template is what the Big Data and Artificial Intelligence (H) Working Group is now building, with its next formal step at a July 22, 2026 public meeting (NAIC BDAI Working Group).

Where the Model Bulletin Stops Short

The Model Bulletin tells carriers to maintain a written AI program covering five functional domains (NAIC Model Bulletin, December 2023): AI governance and accountability, risk management and internal controls, third-party vendor oversight, consumer transparency and notice, and responsiveness to regulatory inquiries. Within each domain it describes the type of information a regulator may request during an examination, and then it stops. It specifies no format, no required data fields, no threshold for a sufficiently documented program, and no definition of how a carrier demonstrates compliance rather than merely asserting it.

That ambiguity is a deliberate feature of principles-based regulation. It lets carriers build programs proportionate to their size and risk, avoids locking in documentation standards that technology will outpace, and lets regulators adapt expectations as practice evolves. The trade-off is an enforcement gap: a carrier can point to a written policy, a quarterly governance committee, and a vendor management clause in its procurement contracts and claim compliance without producing a single data point on how any specific system performs, what data trained it, or whether its outputs have been tested for disparate impact across protected classes. Early reviews from the 12 pilot states (Swept AI, 2026) confirm the gap is real, not hypothetical. Examiners arriving for an AI governance review keep hearing "we have a policy" in response to questions the bulletin's framework implies should produce evidence: bias testing results, model version histories, complaint resolution records tied to AI-driven decisions, and written explainability protocols for adverse actions. A report form with defined fields forces carriers to generate that evidence instead of gesturing toward it.

The Nine Disclosure Components

The draft structure NAIC staff presented at the Spring 2026 meeting (Mayer Brown, April 2026) is the most operationally specific output of any NAIC AI governance effort to date, because it translates the bulletin's five domain descriptions into nine disclosure components that a carrier's legal, actuarial, and compliance functions would jointly complete. They are: an executive summary establishing the scope and geographic reach of the AI program; a board and senior management attestation confirming oversight responsibility; a models and data sources inventory covering both internal training data and external purchased data, with separate disclosure of selection bias controls and design constraints for each; a risk assessment framework describing the methodology for classifying each system by risk tier; a model cards section providing structured technical documentation per in-scope system; a corporate governance narrative showing the reporting chain from model developers to the board; a model drift and validation section covering monitoring frequency, performance thresholds, and retraining triggers; a protected class inference and bias testing section reporting methodology and results by system; and a consumer complaint disclosure documenting how AI-influenced adverse decisions are flagged, explained, and appealed.

The executive summary and board attestation add an accountability layer the bulletin leaves implicit. A senior officer signs the report, creating a named chain from the AI program to a specific executive who vouches for its completeness, which is a meaningful change from a policy that sits unsigned in a compliance manual. Board attestation of AI governance is already routine in public-company proxies for exchange-listed carriers: a Global AI Policy, AI Advisory Council, and four-tier governance hierarchy (AIG proxy, March 2026). The compliance report would convert that kind of disclosure from an investor relations choice into a regulatory filing requirement.

How the Four-Exhibit Evaluation Tool Maps to the Report

The 12-state AI Systems Evaluation Tool pilot, running from March through September 2026 (Fenwick, 2026), is the report's regulatory counterpart. The compliance report is what a carrier files proactively to demonstrate an adequate program; the evaluation tool is what state examiners use during market conduct and financial reviews to test whether the carrier can actually produce the evidence behind those representations. The tool's four exhibits map cleanly onto the report's nine components, and reading them together clarifies what the full documentation package demands.

Exhibit A asks carriers to quantify their AI footprint: how many systems, across which functions, affecting what volume of decisions, mapping to the report's models and data sources inventory. The scope question is harder than it sounds. Vendors embed machine learning inside underwriting workbenches, claims triage platforms, fraud detection systems, and customer service tools, and carriers rarely track centrally which of those fall under the NAIC definition. The tool explicitly reaches "vendor-embedded models, automated decision components inside larger platforms, and machine learning features that nobody in the organization categorizes as AI." A carrier nominally running three vendor platforms and two internal models may have twelve systems in scope once embedded components are counted.

Exhibit B addresses governance risk assessment, in narrative or checklist form, and maps to the report's governance, risk assessment, and board attestation components. The distinction pilot states have flagged is between governance infrastructure and governance theater. A policy describing an AI risk committee, plus the committee's quarterly minutes, does not prove the committee functionally assesses system risk. Real evidence is the output the committee produces: minutes referencing specific system performance metrics, escalation records for systems that breached risk thresholds, documented remediation for identified gaps. Exhibit B is built to surface whether those outputs exist, not merely the structures meant to generate them.

Exhibit C is the most demanding exhibit and the one where carrier gaps are widest. It applies to high-risk systems, those used in underwriting, pricing, claims handling, fraud detection, and any function where AI outputs materially affect consumer access to or cost of coverage. For each, carriers must document model design and architecture, training data composition and vintage, validation procedures and performance metrics, bias testing methodology and results by protected class, and sample case files showing how outputs contributed to specific decisions. Third-party vendor models are fully in scope, with the carrier on the hook for documentation it must obtain from the vendor. A carrier that bought a GLM-based pricing model in 2021 and a gradient-boosted underwriting score in 2023 owes Exhibit C documentation for both, including the vendor's bias testing results, regardless of whether the contract obligates the vendor to provide them.

Exhibit D covers the data layer: sources, quality controls, representativeness testing, and discrimination risk assessment, mapping to the inventory fields that require disclosure of selection bias in internal datasets and design constraints in external data purchases. The NAIC singles out aerial imagery, social media data, and purchasing behavior data as categories warranting explicit discrimination risk analysis at the data layer, separate from the output-level bias testing required under Exhibit C.

Why the Tool Surfaces in Exams Carriers Already Face

The 12 pilot states are deploying the evaluation tool inside existing market conduct examinations, financial examinations, and financial analysis, rather than standing up a separate AI examination track. That matters because AI governance documentation can now appear in reviews carriers undergo for unrelated reasons. A property and casualty market conduct exam triggered by personal-auto complaint volume could fold in Exhibit C requests for pricing and underwriting systems; a life insurer's financial examination focused on reserve adequacy and RBC ratios could pull in Exhibit B and D requests for underwriting and claims triage models.

Whether the tool acts as an information-gathering instrument or an enforcement framework depends on how each state handles findings, and the pilot states have not committed to a uniform outcome. Some have signaled that findings will inform risk profiling for future examination scheduling; others reserve the right to treat material documentation deficiencies as findings requiring corrective action. That ambiguity is intentional during the pilot. The NAIC will collect data through September 2026, update the tool in October, re-expose it for public comment, and bring it to a vote at the Fall National Meeting in November 2026, with the adopted version carrying clearer guidance on how findings translate to outcomes.

The pilot also tests whether the tool can be applied without crushing smaller carriers. A "principle of proportionality" in its design directs examiners to prioritize high-risk systems and scale expectations to a carrier's size and deployment complexity. A regional personal lines carrier running a vendor credit-based insurance score and a rules-based claims router faces a very different burden than a national multiline carrier with internal ML across pricing, underwriting, fraud, and claims. The open question the pilot will answer is whether proportionality in practice keeps the requirements from landing only on the largest carriers while smaller insurers' AI governance goes unexamined.

The Reinsurer Questionnaire Already in the Field

Set the draft report categories beside the AI governance questionnaires leading reinsurers have folded into cedant due diligence over the past 18 months and the overlap is striking. The fields the report would require in its model cards, bias testing, and governance attestation sections closely track what Munich Re, Swiss Re, and Gen Re now request when reviewing cedant AI programs for treaty placement and pricing.

Reinsurers act on a commercial incentive the NAIC lacks: they absorb a share of the adverse development that flows from underwriting model errors, and they have to price that risk. Their questionnaires ask cedants to identify systems by business function, document training data vintage and refreshment cadence per system, produce bias testing segmented by protected class and geography, and describe the human review protocols that override or qualify model outputs in the underwriting and claims workflow. A carrier whose treaty already requires annual AI governance certification is building much of the infrastructure the compliance report would demand. The report form standardizes that infrastructure across the market instead of leaving it a bespoke arrangement between each carrier and its reinsurance panel. The convergence is happening independently of the regulatory calendar: carriers that negotiated AI disclosure into their treaties have a head start, and those that have not should expect regulatory and commercial pressure to arrive on roughly the same timeline.

Three Documentation Gaps Carriers Will Hit

The distance between what the report would require and what most carriers can produce today falls into three dimensions, each demanding a different operational response.

The first is the model inventory gap. Exhibit A and the report's inventory section both require a complete accounting of every in-scope system, yet most carriers keep no centralized registry that captures vendor-embedded components alongside internal models. Without one, a carrier cannot answer Exhibit A accurately, cannot establish which systems are high-risk under Exhibit C, and cannot attest in the executive summary that the report covers the program's full scope. Building the registry requires actuarial, IT, underwriting, claims, and legal to engage at once, because each function's AI is often owned by a different team with no central steward. For a mid-sized carrier starting from scratch, that is a six-to-twelve-month build.

The second is vendor contract coverage. Many carrier-vendor contracts signed before 2024 predate the tool's documentation requirements and grant the carrier no right to model documentation, bias testing results, training data composition, or demographically segmented performance data. When Exhibit C asks about a vendor pricing model, the carrier needs the vendor's documentation, and where the contract does not require it, the only compliance path is renegotiation. Pilot examinations have surfaced this repeatedly: carriers cite "vendor" as the reason they cannot produce Exhibit C material, and the pilot states have been clear that managing vendor relationships is the carrier's responsibility, not grounds for examination deference.

The third is evidence generation versus policy maintenance. The report's board attestation, model drift, governance narrative, and complaint sections all demand continuous evidence rather than static documentation. A carrier can draft an AI risk policy in a day; demonstrating that the policy is operationally live, that the governance committee reviewed a model's drift metrics in March and documented a remediation decision in April, takes months of behavior change before the records exist. Carriers that start generating that evidence now, before the form is finalized in 2027, will file documentation that reflects genuine governance. Those that wait and then assemble the records retroactively will struggle to present them as anything other than examination preparation.

The Calendar to 2027 and 2028

The NAIC's path from here is concrete. The 12-state pilot runs through September 2026; staff update the tool on pilot feedback in September and October, then re-expose it for comment. The Working Group's July 22 public meeting, where an actuarial panel on AI governance trends and the report structure update share the agenda, is the last major session before the fall cycle. The revised tool goes to a vote at the Fall National Meeting in November 2026, with adoption expected to put it in every state insurance department's hands for 2027 market conduct and financial examinations.

The report structure tracks slightly behind. The Denver draft is a working document, not an exposure draft; it will be refined over the summer on working group input and pilot findings, exposed for comment in the fall alongside the revised tool, and likely finalized in the first half of 2027 for use across the 24-plus adopting states. The first carriers required to file a structured AI compliance report under a standardized template are almost certainly looking at a 2027 requirement in the pilot states, with broader rollout into 2028 as more states adopt both the bulletin and the report.

That leaves a roughly 18-month window, and it is not free time. Model inventories take 6 to 12 months to build accurately. Vendor renegotiations, especially for multi-year treaty arrangements with embedded AI, run on multi-quarter timelines. Bias testing infrastructure is the deepest lift: for gradient-boosted tree models or neural networks, disparate impact testing requires output-based demographic analysis rather than variable-level review, forcing model pipeline changes that actuarial, IT, and governance teams have to coordinate. None of these workstreams compresses well under filing pressure.

The Nearer Story Than the Model Law

Most trade-press coverage has fixated on whether the Working Group will escalate from a principles-based bulletin to an enforceable Model Law, which would require state legislative adoption and carry statutory penalties. That question is genuinely open: 33 comment letters on the NAIC's Model Law request for information (NAIC, 2025) exposed deep industry disagreement on scope, vendor liability, and what counts as a covered AI system, and a Model Law, if it advances, is years from widespread adoption.

The compliance report is the nearer-term and more operationally consequential development, precisely because it needs none of that. It requires no Model Law and no new legislation, operating within the examination authority states already hold under adopted versions of the bulletin and existing market conduct statute. A state that has adopted the bulletin can, during a market conduct exam, ask a carrier to complete the report form as evidence of program adequacy. Once the NAIC standardizes it, states will use it whether or not a Model Law passes, because it hands examiners a structured way to run AI governance reviews without building their own framework. The same form doubles as an internal diagnostic: walking through the nine components, the inventory, the governance attestation, the bias testing section, the vendor documentation fields, surfaces a carrier's gaps more systematically than an internal audit, and a dry run against the Denver draft will expose those gaps 12 to 18 months before they become an examination finding. The discipline is the familiar actuarial one of modeling the exposure before the loss event, not after.