Mapping the Academy's AI Use Case Brief to Actuarial Review Duties

Survey data the NAIC collected from auto, home, life, and health carriers shows AI adoption running between 58 and 92 percent across major lines of business, with personal auto at the high end and life insurance at the low. Underwriting, pricing, claims, and marketing are the functions where deployment is most concentrated. The American Academy of Actuaries' Data Science and Analytics Committee published its AI Use Cases in Insurance and Pension issue brief on June 11, 2026, and it is the most systematic professional inventory of those applications published to date. It names use cases across P&C underwriting, life and health insurance, claims adjudication, fraud detection, and pension and retirement functions. The problem is that none of the carriers in that 92-percent cohort are being examined on whether they can name their AI applications. They are being examined on whether they can prove, with documentation produced on short notice, who reviewed each system before deployment, what testing was done, and what evidence was retained. The Academy brief is the right starting point. A control map is what actuarial teams actually need.

Comparing professional guidance documents with the documentation requests emerging from market conduct examinations in Colorado, New York, and the twelve states participating in the NAIC AI evaluation tool pilot, the recurring gap runs consistently in the same direction: carriers can list their AI systems, but the evidence chain connecting each system to a named reviewer, a pre-deployment test record, and a set of retained workpapers is often incomplete or absent. The Academy brief, read in that context, is most useful as a prompt for building a per-function control inventory rather than as a finished governance document.

The Brief and the Scope It Sets

The Academy's June 2026 brief, produced by its AI, Data Science, and Analytics Committee, covers use cases across several broad categories: risk selection and underwriting, rating and pricing, claims processing and adjudication, fraud detection and prevention, marketing and distribution, operational efficiency, and pension and retirement plan applications. The brief draws on the committee's ongoing work, including the August 2023 issue brief on discrimination considerations for machine learning models, the Academy's February 2025 comment letter to the International Association of Insurance Supervisors on AI supervision, and Contingencies pieces tracking how actuaries are engaging with AI governance questions. The framing throughout is descriptive and professional: here are the domains where AI is being deployed; here is what actuaries should understand about each.

What the brief does not provide, by design, is a control taxonomy. That is not a criticism. Issue briefs document the state of practice; they do not prescribe implementation-level compliance frameworks. But actuarial teams sitting across the table from a state examiner need exactly that implementation-level taxonomy: which use cases require independent validation before deployment, which require ongoing monitoring reports, which generate adverse action notice obligations, and which third-party vendor records need to live in the actuarial workpaper versus the legal file.

The following sections build that taxonomy from the use case categories the Academy brief establishes, mapped against the documentation framework in the NAIC Model Bulletin on the Use of Artificial Intelligence Systems by Insurers (adopted December 2023, now in force in 24 states and the District of Columbia) and the specific examination protocols emerging from the evaluation tool pilot.

Grouping Use Cases by Decision Impact

The Academy brief organizes use cases by business function. For control purposes, a different grouping is more useful: by the potential for adverse consumer outcomes, which is the explicit standard the NAIC bulletin uses to calibrate required oversight. Controls must be "commensurate with both the risk of Adverse Consumer Outcomes and the Degree of Potential Harm to Consumers." Four tiers follow naturally from that standard.

Tier	Decision Type	Example Applications	Regulatory Exposure
1: Adverse Action	Coverage denial, claim denial, policy cancellation or non-renewal	Automated underwriting declinations; AI-assisted claim denials; churn-prediction-driven non-renewal	Adverse action notice requirements; state human-in-the-loop laws; NAIC bulletin Section 4 examination readiness
2: Pricing and Classification	Rate relativities, territorial factors, individual pricing decisions	ML-based territorial segmentation; gradient-boosted pricing models; telematics scoring	State rate filing review; Colorado four-fifths rule compliance (annual report due July 1); bias testing disclosure requirements
3: Triage and Routing	Queue management, severity scoring, referral decisions	Claims triage scoring; underwriting referral queuing; SIU referral models	Indirectly: outcomes from routing decisions feed into Tier 1 and 2 exposure; disparate routing outcomes are examinable
4: Internal Analytics	Reserve trend analysis, experience studies, pension liability modeling	ML-assisted loss development; mortality assumption development; asset allocation optimization	ASOP No. 56 documentation; ERISA fiduciary standard for pension; no direct consumer exposure, but actuary sign-off required

Tier 1 and 2 use cases require the most rigorous pre-deployment validation, independent review, and ongoing monitoring. Tier 3 use cases require controls calibrated to the downstream decisions they enable. Tier 4 use cases require actuarial documentation and professional judgment under existing ASOPs, with a heightened fiduciary layer for pension applications. The error actuarial teams make most frequently is treating all four tiers with the same documentation intensity, either doing too little for Tier 1 or doing full regulatory-grade validation for Tier 4 internal analytics that regulators are not examining at the same level.

Control Duties by Function

Underwriting AI

Underwriting applications sit across Tier 1 and Tier 2 depending on whether the model outputs a declination decision or a pricing factor. The control duties differ accordingly. For risk selection models that feed into coverage eligibility decisions, the owner should be a named actuary or chief underwriting officer with clear professional responsibility under ASOP No. 56 and, in credentialed lines, the relevant standard of practice. Pre-deployment testing must include performance on holdout data not used in training, disparate impact testing across the protected classes specified in each state’s bias testing requirements (nine classes under Colorado’s SB 21-169 framework, using the four-fifths rule as the threshold), and feature justification documentation connecting each model input to actuarial experience or legitimate underwriting rationale. States with enhanced requirements, Colorado and New York foremost, will request that justification documentation as part of rate filing or market conduct submissions.

For pricing models, the same pre-deployment testing applies with additional documentation of how model outputs map to filed rating factors. A gradient-boosted pricing model that generates individualized risk scores must document how those scores connect to the filed relativities in the actuarial rate indication. SHAP values or other feature attribution outputs are increasingly expected in filing supplements and examination responses, not as statistical curiosities but as the mechanism for translating what the model computed into language an examiner can evaluate against the filed rating plan.

Ongoing monitoring for underwriting AI should include quarterly review of adverse action rates by demographic segment, annual bias testing reports (Colorado requires submission by July 1 each year for auto and health lines), and model drift tracking at the class and territory level. Carriers that have not established drift detection thresholds in advance typically cannot demonstrate adequate monitoring when examiners ask how the carrier would know whether the model’s performance had degraded since deployment.

Claims AI

Claims adjudication models present a distinct control challenge because the adverse outcome is claim denial rather than premium pricing, which shifts the regulatory exposure. The NAIC's Spring 2026 National Meeting flagged claims handling for specific AI scrutiny, and several states have enacted or proposed human-in-the-loop requirements for AI-assisted coverage decisions. The practical control duties for claims AI include three items that are frequently missing from actuarial workpapers.

First, denial reason code completeness testing: where an AI model contributes to a claim denial, the documentation must establish that the reason codes provided to the claimant are accurate and traceable to the model’s actual decision logic, not generic codes assigned after the fact. Second, false positive rate monitoring by claimant demographic: SIU referral models, in particular, must track whether referral rates differ across demographic groups in ways that cannot be explained by legitimate claim characteristics. Third, settlement rate and reopened-claim rate tracking by AI score band, which is the outcome-level check on whether the model’s tier assignments are producing consistent downstream results. Pre-deployment testing for claims AI should include comparison to human adjuster outcomes on a holdout set of claims, with explicit review of cases where model and adjuster disagreed. Those disagreement cases are exactly what a state examiner will ask for during a market conduct review focused on AI-assisted claims handling.

Reserving and Loss Estimation

Reserving applications are the domain where ASOP No. 56 has the clearest direct application, because the actuarial sign-off on a loss reserve opinion is a formal professional act. When an ML-assisted model contributes to a reserve indication, the opining actuary must satisfy ASOP 56’s requirement to "understand the model" at a level sufficient to assess its appropriateness. For a chainladder or Bornhuetter-Ferguson calculation, that understanding is assumed. For a gradient-boosted loss development model or a neural network applied to tail factor selection, it is not. The pre-deployment documentation must include: backtesting of the model against actual loss development on held-out accident years, comparison of model-implied LDFs against traditional methods across multiple development periods, sensitivity testing showing how the reserve indication moves under different model calibrations, and an explicit statement of which assumptions the actuary is relying on the model to supply versus which the actuary is supplying independently.

Ongoing monitoring for reserving AI centers on actual-to-expected analysis: each quarter, the reserve actuary should track how actual development compares to the model’s implied development pattern and document the comparison. A model that consistently underdevelops or overdevelops by more than a pre-defined threshold should trigger re-validation. The monitoring cadence and threshold should be documented before deployment, not established after the model shows unexpected performance.

Pension and Retirement Applications

Pension applications in the Academy brief cover mortality assumption development, asset allocation optimization, and actuarial equivalence testing for benefit calculations. These applications carry a fiduciary dimension that underwriting and P&C reserving do not. The enrolled actuary signing a pension valuation is subject to the ERISA prudent expert standard, which requires that actuarial assumptions be reasonable in the aggregate and that the basis for adoption be documented with enough specificity for a subsequent reviewer to assess the judgment. When AI or ML tools are used to develop or refine mortality assumptions, the workpaper must document: the data sources used to train the model, how the model’s mortality output compares to published SOA tables (the PRI-2012, RPH-2014, or MP improvement scale series, as applicable), any plan-specific adjustments made by the actuary to the model output, and the professional judgment basis for those adjustments. Using a vendor tool to generate mortality assumptions and then adopting them without an independent actuary check against published tables is not a defensible professional practice under ERISA, and it is not consistent with ASOP No. 25 (Credibility Procedures) when credibility is the basis for assumption weighting.

From Inventory to Test Plan: The Actuarial Translation

The Academy brief establishes that AI is being used across underwriting, claims, reserving, and pension functions. The actuarial team’s job is to translate that use case inventory into a per-model control record. Each record should answer four questions: who owns the review; what pre-deployment tests were run and what thresholds defined a pass; what ongoing monitoring is required and at what cadence; and what third-party vendor documentation belongs in the actuarial workpaper versus the legal file. A carrier with twelve AI systems in production and no per-model records has a use case inventory. It does not have a governance framework.

Consider a claims fraud detection model sourced from a vendor, the type of application the Academy brief catalogues under fraud detection and the NAIC survey shows deployed across a majority of carriers. The per-model record would specify the following. Owner: chief actuary, with named backup. Pre-deployment tests: precision and recall on a holdout validation set, false positive rate broken out by claimant demographic and claim type, review of the vendor’s internal validation report by the carrier’s own data science or actuarial team, and a one-page actuary attestation that the model’s output is appropriate for its intended use. Ongoing monitoring: monthly false-positive audit comparing AI-referred claims to adjuster override decisions, quarterly SIU referral-to-confirmed ratio by model tier, and an annual model performance review against current claim experience. Vendor documentation in the workpaper: the model card describing training data and intended use, the most recent independent validation report summary, and the date of the most recent audit rights exercise. The full vendor validation report and audit records stay in the legal file, indexed to the workpaper.

That documentation structure takes about two days to build for an existing model if the underlying test records exist. If they do not exist, the pre-deployment testing must be reconstructed or re-run on current data. Carriers that discover during an examination that no pre-deployment testing records were retained for models deployed two or three years ago are in the most difficult position: they cannot demonstrate what they would have needed to demonstrate before deployment, and a post-hoc validation on current data does not answer the regulator’s question about what the carrier knew and when.

Third-Party Vendor Evidence in the Actuarial Workpaper

The NAIC model bulletin is specific about vendor documentation: contracts with audit rights are a prerequisite, but they are not sufficient. The documentation standard includes evidence of "any audits or confirmation processes performed." Clauses without exercises are not compliance. This distinction matters for how actuarial workpapers are structured.

The workpaper is the actuary’s professional record of the actuarial work performed. For models sourced from a third-party vendor, the actuary cannot sign off on the model’s appropriateness without some evidence of what the vendor built and how it was validated. The actuary does not need to reproduce the vendor’s validation work independently; that is the function of the audit right. But the workpaper should contain enough from the vendor’s validation materials to support the actuary’s professional judgment. In practice, that typically means: the model card or technical specification document identifying training data, intended use, and known limitations; a summary of the vendor’s validation statistics (precision, recall, AUC, or the relevant performance metric for the use case); the date and scope of the most recent audit the carrier conducted under its contract audit right; and the actuary’s written professional judgment on whether the vendor’s validation approach is adequate for the carrier’s intended application.

Carriers using third-party AI vendors whose models appear in the NAIC’s emerging vendor registry discussions face an additional documentation question: whether the vendor’s model card and validation materials are current. A vendor validation report from 2023 supporting a model that has been updated twice since then does not satisfy the documentation expectation. Actuarial workpapers should record the version of the model in use and confirm that the validation materials correspond to that version, not to an earlier release.

The split between what goes in the actuarial workpaper and what goes in the legal file is a practical division of labor, not a regulatory requirement. What matters is that both files exist, that they are indexed to each other, and that the carrier can produce both within the 30-day window regulators expect during market conduct examinations under the NAIC bulletin framework.

The NAIC Connection: Documentation Under Examination

The 12-state AI evaluation tool pilot running through September 2026 is building the examination templates that will shape documentation expectations when the NAIC formally adopts the evaluation tool at its Fall 2026 National Meeting. The tool asks carriers to identify their high-risk AI systems (Tier 1 and 2 in the framework above), describe the governance controls applied to each, and provide documentation of validation and ongoing monitoring. Carriers that have built per-model control records structured around decision impact tiers will be able to respond to those questions directly. Carriers that have a use case inventory but no per-model documentation will need to rebuild the record under time pressure, during an active examination.

The 30-day production window the bulletin establishes for examination responses is not generous for carriers with large AI portfolios. A carrier running fifteen AI models across underwriting, claims, and pricing, sourced from a mix of internal development and third-party vendors, needs pre-built documentation files for each model, indexed to a master inventory, with named owners responsible for maintaining current versions. Producing that documentation in response to an examination request is significantly harder than maintaining it as a normal part of model governance. The carriers that have established per-function actuarial review protocols before examination requests arrive are the ones generating the documentation as a byproduct of their normal governance cycle rather than as a fire-drill response.

Patterns from reviewing the compliance reporting templates in states that have adopted the bulletin with supplemental requirements show that the documentation gaps cluster in three places: third-party vendor evidence (the audit right was in the contract but never exercised), pre-deployment testing records for models deployed before the bulletin was adopted, and ownership assignments for models that are technically maintained by data science or IT rather than actuarial or underwriting. The Academy brief establishes the breadth of AI use in insurance. The control map for actuarial teams is what closes the gap between that inventory and a state examination file.

Actuarial Implications

The Academy’s June 2026 brief is a credible and useful professional document. Its value for actuarial governance purposes is primarily as a prompt: here are the categories of AI use in your industry; now build a per-model control record for each application in your organization. The four control questions that record must answer, owner, pre-deployment tests, ongoing monitoring, and vendor documentation, are the same regardless of which line of business or which technology underlies the application. The answers differ significantly by function, which is why grouping use cases by decision impact tier rather than by technology label produces a more actionable governance structure.

Actuaries who build those records now, before the NAIC evaluation tool reaches formal adoption and before their state conducts an examination, will have converted a professional inventory document into a defensible compliance framework. Actuaries who wait will be building the same documentation under examination pressure, which is both more expensive and less reliable than building it as part of a structured governance cycle. The Academy brief named the use cases. The 30-day clock starts when the examination letter arrives.

Sources

American Academy of Actuaries, AI and Data Resources (including the June 11, 2026 AI Use Cases in Insurance and Pension issue brief)
NAIC, Artificial Intelligence Topic Page
NAIC, Model Bulletin on the Use of Artificial Intelligence Systems by Insurers (December 2023)
Quarles, "Nearly Half of States Have Now Adopted NAIC Model Bulletin on Insurers' Use of AI"
Colorado Division of Insurance, SB 21-169 Guidance
Actuarial Standards Board, ASOP No. 56: Modeling