Insurer AI Returns: What the Evident AI Index Measures and What an Actuarial Scorecard Should

The 2026 Evident AI Index scores 30 major insurers across more than 60 AI capability indicators, yet only 3 of those 30 carriers disclose comparable enterprise-level financial returns: Manulife at $217 million, Generali at $116 million, and Intact Financial at $145 million in 2025 AI-attributed value (Evident Insights, June 2026). The remaining 27 are ranked by process proxies -- headcount, job postings, use-case volume -- not by loss ratios or claim outcomes.

That gap is not a criticism of the index's methodology, which is deliberate and defensible. It is a diagnosis of where the industry sits in its measurement cycle. Carriers have built AI capability at scale. Turning that capability into attributable insurance returns is harder, slower, and methodologically messier than the investment cycle that preceded it. The actuarial question is what a credible measurement framework actually looks like -- and what it requires carriers to do differently from what most are doing today.

Rankings, Talent, and the Disclosure Boundary

The 2026 Evident AI Index, now in its second edition, evaluates the AI maturity of 30 of the world's largest insurers in North America and Europe across four domains: talent, innovation, leadership, and transparency. The index relies exclusively on publicly available data -- job postings, patent filings, research publications, earnings call transcripts, and press releases -- scored against 68 individual indicators. Allianz tops this year's rankings, overtaking AXA, with Manulife third, Zurich fourth (up from 12th in 2025), and Liberty Mutual fifth. Three of last year's top five retained their positions.

The capability signals the index captures are genuinely informative. AI-specialist roles across the 30 carriers grew 32% year-over-year and now represent approximately 1 in 50 employees, even as broader insurance industry headcounts contracted 2.2% over the same period (Evident Insights, June 2026). Twenty of 30 carriers now publicly report at least one AI use case with documented outcomes, up 8 from the prior year. Allianz has registered 900 AI use cases globally; the top five carriers account for 48% of well-documented industry use cases. Agentic AI deployments, meaning systems in which models take multi-step actions without step-by-step human direction, have grown from 5% to 25% of newly disclosed use cases over the past six months.

But the methodology's output-agnostic design creates a ceiling for what it can demonstrate. Alexandra Mousavizadeh noted that "capabilities that once set the frontrunners apart are becoming more widespread" across the 30 carriers (Evident Insights, June 2026). Democratized capability does not mean democratized proof. Allianz's 28% larger AI talent pool than AXA is visible in job postings. Whether it is moving Allianz's combined ratio is not in the index.

The use-case quality data reinforces the ceiling. Of the documented use cases across the 30 carriers, 49% remain narrow in scope, focused on speed and cost reduction rather than underwriting quality or risk selection. Only 8% demonstrate advanced maturity with agentic reasoning (Evident Insights, June 2026). Speed and cost reduction are operationally valuable. They do not appear in a loss ratio unless they change risk selection or loss severity. The 49% of use cases targeting speed and the 8% demonstrating agentic reasoning represent two very different relationships to the insurance P&L.

The Three Carriers with Enterprise ROI Figures

Manulife, Generali, and Intact Financial have attempted something the other 27 have not: attaching a dollar figure to AI returns at the enterprise level. The three figures are not perfectly comparable -- carriers define "AI value" differently -- but they establish a rough order of magnitude for what return disclosure looks like in practice. Manulife generated more than $217 million in AI-attributed value in 2025 and projects $723 million by the end of 2027. Generali reported $116 million in 2025 with a $407 million projection for 2027. Intact Financial's 2025 figure was revised upward 33% to $145 million, with projected value reaching $361 million by the end of the decade. Combined, the three project more than $1 billion in AI-attributed value within two years (Evident Insights, June 2026).

The upward revision to Intact's figure is the most analytically interesting data point. Early AI value estimates typically undercount returns from adjacent workflow changes and productivity compounding that emerge over implementation cycles. They also undercount the competitive cost of not investing when peers are doing so at scale: the counterfactual deteriorates over time in ways that are hard to quantify prospectively but become visible retrospectively in market share and pricing adequacy.

What these three disclosures do not contain is a bridge from enterprise AI value to the specific insurance metrics actuaries use. None of the three figures is broken down by underwriting versus claims versus operations. None specifies whether the attributed returns include loss ratio improvement, expense ratio reduction, or both. The dollar figures are boardroom metrics. An actuary reviewing a rate filing, reserving a portfolio, or modeling capital adequacy needs to know where in the insurance P&L those dollars actually live before they can incorporate them into a technical analysis.

The Industry-Wide Measurement Gap

The Evident disclosure gap, 27 of 30 carriers without comparable ROI figures, is consistent with measurement weakness across the broader industry. Capgemini's World Property and Casualty Insurance Report 2026, drawn from surveys of 344 senior executives, 809 insurance employees, and 1,113 policyholders across the Americas, Europe, and Asia-Pacific, puts a specific number on the deficit: 42% of insurers track no AI performance metrics whatsoever, and more than 55% report unclear returns on AI initiatives (Capgemini, May 2026). Only 10% of P&C carriers have successfully scaled AI across their operations.

The spending has continued regardless. More than 50% of 2025 insurance venture dealflow targeted AI-centric transactions, up from the broader insurtech focus of prior years (Evident Insights, June 2026). That concentration of capital has not produced proportional investment in measurement. The Capgemini data identifies a structural mismatch: 72% of insurer AI investment flows to infrastructure and technology, while only 28% flows to training and change management (Capgemini, May 2026). Infrastructure investment is exactly what a capability index like Evident can measure. It does not produce measurable insurance outcomes unless the underlying workflows change.

The evidence that workflows are not changing as fast as the infrastructure suggests is direct: 47% of insurance employees with access to AI tools report their workday is essentially unchanged after 18 months of use (Capgemini, May 2026). Infrastructure deployed without workflow redesign is capability without output. It improves an insurer's score on a maturity index. It does not improve its combined ratio.

Five Metrics That Connect AI Spending to Insurance Performance

The measurement deficit is partly methodological. Carriers report what is easy to count: number of claims straight-through processed, processing speed, reduction in manual touches per submission. These are valid operational metrics, but they do not answer the questions that actuarial and finance leadership need answered. Five metrics form the actuarial core of an AI return measurement framework.

Metric	What AI Should Move	Minimum Credibility Threshold	Primary Confounder
Loss ratio (AI-screened segment)	Selection quality; adverse risk exclusion	3,000+ exposure units, 2+ policy years	Underwriting mix shift
Hit ratio	Submission binding quality; new business selection	1,000+ submissions per period, matched control period	Risk appetite or distribution shift
Claim leakage rate	Closure accuracy; reduction in payments above reserve	500+ closed claims; documented pre-AI leakage baseline	Claim settlement timing shift
Expense ratio by function	Cost per unit of underwriting or claims volume	Full-year expense reclassification; shared-service allocation rule	Shared-service cost allocation methodology
Claims cycle time and ALAE	Days-to-close; allocated loss adjustment expense per claim	1,000+ closed claims per line; severity-stratified comparison	Complexity mix across claim severity bands

The expense ratio is the most immediately traceable line. The P&C industry's expense ratio fell to 25.3 in 2024 from 27.7 in 2014, a 2.4 point reduction over a decade driven primarily by remote work and operational consolidation rather than AI (Carrier Management, January 2026). Morgan Stanley projects a further 2.0 point reduction by 2030 attributable to AI, producing $9.3 billion in operating income uplift for their carrier cohort at a 180 basis point operating margin improvement (Carrier Management, January 2026). For a carrier managing a $2 billion expense base, a 2 point reduction is $40 million. That is real. It also requires a clean attribution methodology to separate AI contribution from the mix of operational initiatives that affect expense ratios simultaneously.

The loss ratio metric is the most contested. AI underwriting tools consistently demonstrate improved selection in controlled pilots. The mechanism is straightforward: if an AI model screens out a subset of submissions that historically generate adverse loss experience, the loss ratio on the bound book improves. For a carrier with $1 billion in premiums where AI-screened renewals represent 40% of the portfolio, a 3-point loss ratio improvement on that segment produces $12 million in underwriting benefit. The challenge is holding all else constant -- mix, pricing level, limits profile, and territory distribution -- between the pre-AI and post-AI periods. Rarely do all four hold simultaneously.

Pilot Results and Portfolio Credibility

Pilot AI implementations in underwriting and claims almost always show stronger results than full-scale rollouts. Three mechanisms explain the gap consistently.

Risk segment selection is the first. Pilots target the most structured, highest-volume, lowest-complexity claim types or submission segments: water damage claims below a severity threshold, personal auto renewals with complete telematics data, or small-premium commercial submissions with standardized application data. Those are the segments where AI has the highest probability of accurate automation. Extending to complex commercial property, specialty liability, or catastrophe-exposed homeowners brings in data heterogeneity, low volume, and less training signal. Pilot results do not transfer.

Volume credibility is the second. A 500-claim pilot does not have credibility weight. Standard actuarial credibility weighting for a pure premium indication on 500 claims at moderate severity assigns roughly 30 to 40 percent weight to the observed data. A carrier reporting a 10-point loss ratio improvement from a pilot of that size is reporting a result that could fall within the range of natural statistical variance. The minimum threshold in the scorecard table above -- 3,000 exposure units and two full policy years for the loss ratio metric -- is not arbitrary. It reflects the credibility standard that distinguishes a signal from noise in insurance development data.

Carrier self-selection is the third. The Capgemini trailblazer analysis found that the top 10% of carriers successfully scaling AI achieved 21% higher revenue growth and a 51% greater share price increase over three years compared to the broader industry (Capgemini, May 2026). But trailblazers are not a random draw from the carrier population. They are carriers with stronger baseline underwriting discipline, more consistent data infrastructure, and higher pre-AI profitability -- the same characteristics that also make AI implementation tractable. Cross-sectional correlation between AI investment and financial performance overstates causal AI impact because it does not control for baseline carrier quality. The 21% revenue growth differential is real. The share attributable to AI versus to the underlying carrier characteristics that enabled AI at scale cannot be read directly from the correlation.

Three Confounders That Distort AI ROI Measurement

Even with credible sample sizes and multi-year development, three specific confounders routinely inflate AI ROI estimates in carrier analyses.

Underwriting mix shift is the most common. A carrier using AI to improve new-business submission screening often simultaneously tightens its risk appetite or shifts its distribution strategy toward lower-hazard segments. Loss ratio improvement follows, but the driver is partly the mix change. Separating AI contribution from mix change requires holding constant the carrier's risk classification distribution, territory exposure, policy limits profile, and pricing tier composition between the pre-AI and post-AI comparison periods. Most carrier analytics environments do not have the data infrastructure to do that cleanly, which is why the 72% infrastructure allocation versus 28% change management finding from Capgemini matters: without the data layer, the attribution is not possible.

The economic cycle is the second. The U.S. P&C industry's loss ratio declined 5.4 points from 2023 to 2024, driven primarily by earned premium growth outpacing loss emergence in personal lines following two years of aggressive rate action (Carrier Management, January 2026). Carriers that deployed underwriting AI during that rate environment will show combined ratio improvement that includes cyclical tailwind. Attributing the full improvement to AI overstates the technology contribution. The inverse is also true: carriers that deploy AI during adverse development cycles may show no measurable improvement even when AI is genuinely working, because the cycle masks the AI signal.

Claim settlement timing is the third, and the one actuaries are best positioned to identify because it lives in loss development patterns rather than in the expense or premium lines. AI-accelerated claims settlement closes claims earlier in the development cycle, shifting IBNR from open reserves to paid losses earlier than historical lag factors would predict. Earlier settlement compresses reported loss ratios in the short term: fewer outstanding claims means lower IBNR provisions. But earlier settlement does not reduce the economic cost of the claim; it advances the payment timing. A carrier comparing its reported loss ratio before and after AI claims implementation must restate both periods on the same settlement-lag basis to isolate AI's contribution from the payment acceleration effect. Failure to do so produces a loss ratio improvement that partially reverses as development matures.

The Measurement Infrastructure the Next Phase Needs

The Evident AI Index documents where 30 carriers have invested in AI capability. The scorecard above identifies what those investments need to be measured against to produce credible financial attribution. The gap between the two is largely a data and governance problem, not a technology problem.

For the measurement framework to work at the carrier level, three agreements need to be reached before AI deployment, not after: what the counterfactual would have been without AI (which requires a defined control group or statistical control design); what the time horizon is for measuring returns (loss ratio attribution typically requires three to five years of development before the signal is credible); and what the attribution methodology is for separating AI contribution from mix change, cycle effect, and reserve development. None of these questions can be answered retrospectively if the carrier did not capture the pre-AI baseline data in the right form.

The regulatory environment is beginning to push carriers toward this infrastructure. The NAIC's AI Model Bulletin has now been adopted in 24 states and Washington, D.C., requiring documented AI governance: risk management frameworks, accountability structures, and bias testing protocols (NAIC, 2023 -- multiple state adoptions through 2025). The bulletin does not yet require financial outcome disclosure. The NAIC's AI examination pilots that launched in early 2026 focus on governance documentation, not performance attribution. That will likely evolve as examiners grow more comfortable with AI oversight: the next question after "do you have governance frameworks" is "are your AI systems performing as documented," which requires outcome data tied to measurable insurance metrics.

The macro projection makes the case for urgency. Morgan Stanley's January 2026 analysis attributes the next 2.0 point P&C expense ratio reduction by 2030 specifically to AI, with $9.3 billion in operating income uplift across their carrier cohort (Carrier Management, January 2026). The carriers that can demonstrate that attribution with clean, auditable measurement will be in a fundamentally different position with investors, regulators, and rating agencies than the 42% that currently track no AI metrics at all. The industry has built the capability. The carriers that build the measurement to match it will be the ones who can actually prove what it is worth.

Sources

"Insurers Continue to Invest in AI, Says Evident AI Index," Insurance Edge, June 19, 2026. insurance-edge.net
"2026 Evident AI Index for Insurance Key Findings Report," Evident Insights, June 2026. evidentinsights.com
"The Moment of AI Truth for Property & Casualty Insurance: Trailblazers See 21% Higher Revenue Growth," Capgemini, May 2026. capgemini.com
"Expense Ratio Analysis: AI, Remote Work Drive Better P/C Insurer Results," Carrier Management, January 12, 2026. carriermanagement.com
NAIC Model Bulletin on the Use of Artificial Intelligence Systems by Insurers, National Association of Insurance Commissioners, adopted December 2023; 24 state adoptions through 2025. content.naic.org
"Allianz Ranks #1 in the 2026 Evident AI Index for Insurance," Allianz Media Center, June 16, 2026. allianz.com