CMS Crowns Milliman for Explainable AI in Medicare Fraud Detection

From reviewing technical submissions in CMS innovation competitions over the past three years, the shift from accuracy-first to explainability-first evaluation criteria reflects a broader regulatory preference that health plan actuaries should note. On December 15, 2025, CMS selected Milliman as the winner of its Crushing Fraud Chili Cook-Off Competition for a proprietary explainable AI tool that detects fraud, waste, and abuse in Medicare claims. The tool’s defining feature is its “glass-box” architecture: a deterministic algorithm rooted in actuarial science and statistics that can be deconstructed to show exactly why a provider was flagged, rather than producing opaque risk scores from black-box neural networks.

The significance extends beyond a vendor recognition event. CMS is signaling, through its evaluation criteria and its operational investments in AI-powered fraud prevention, that explainability is becoming a prerequisite for program integrity technology. For health plan actuaries designing payment integrity programs, procuring fraud detection vendors, or quantifying fraud, waste, and abuse (FWA) savings in rate filings, this competition result provides a concrete reference point for what the regulator expects.

The stakes are substantial. The National Health Care Anti-Fraud Association estimates that fraud costs the U.S. healthcare system more than $100 billion annually. CMS reported $37.4 billion in estimated improper Medicaid payments for fiscal year 2025 alone. Medicare Fee-for-Service improper payments reached $28.83 billion at a 6.55% rate in FY 2025, down from $31.70 billion at 7.66% the prior year. The DOJ’s 2025 National Health Care Fraud Takedown, the largest in U.S. history, charged 324 defendants in connection with $14.6 billion in alleged fraud, more than double the $6 billion record set one year earlier.

$2B+

CMS AI Fraud Prevention Savings Since March 2025

250+

Submissions to CMS Chili Cook-Off Competition

$14.6B

DOJ 2025 Healthcare Fraud Takedown (Largest Ever)

The Crushing Fraud Chili Cook-Off: Competition Format and Judging Priorities

CMS launched the Crushing Fraud Chili Cook-Off Competition in August 2025 as a market-based research challenge aimed at harnessing explainable AI and machine learning models to detect anomalies and trends in Medicare claims data that could be translated into novel indicators of fraud. The competition’s name, drawn from CMS’s broader “Crushing Fraud” initiative, belied the technical rigor of the evaluation process.

The competition ran in two phases. Phase 1 opened for proposal submissions from August 19 through September 19, 2025, with CMS evaluating proposed methodologies and selecting finalists. More than 250 submissions came from across the healthcare analytics ecosystem, including consulting firms, academic research teams, and technology vendors. On October 20, 2025, CMS announced ten finalists: Milliman, MindPetal, a joint Stanford and UCSF research team, Visual Connections, and six other organizations.

Phase 2 gave finalists access to CMS Limited Data Sets covering Medicare Fee-for-Service claims across three categories: Hospice, Part B, and Durable Medical Equipment (DME). These datasets represent some of the highest-risk areas for improper billing. Finalists applied their AI and ML techniques to the claims data and submitted both a summary of findings and proposed scalable analytic and policy solutions. The phase culminated in an in-person demonstration event on December 15, 2025, where finalists presented their results to CMS evaluators.

The judging criteria reveal CMS’s regulatory priorities more clearly than any policy statement. CMS did not simply seek the model with the highest fraud detection rate. Instead, the competition explicitly required that solutions be “explainable,” meaning pattern detection alone was insufficient. CMS emphasized that AI and ML must produce insights “transparent and accessible to program integrity teams, regulators, and policymakers.” This requirement reflects the enforcement reality: flagging suspicious claims is only useful if investigators can understand and articulate why a provider’s billing pattern is anomalous, both for internal prioritization and for building cases that hold up in administrative proceedings or litigation.

CMS CIO Patrick Newbold framed the broader objective at the 2025 Health IT Summit: “We have to provide better service for the public and we want industry to actually bring solutions that solve the problems.” That phrasing, “solutions that solve the problems” rather than “models that maximize accuracy,” captures the philosophical shift that Milliman’s win embodies.

Milliman’s Glass-Box Architecture: How Explainable AI Detects Medicare Fraud

Milliman’s winning solution uses a deterministic algorithm grounded in actuarial science and statistics. The “glass-box” label refers to the model’s fundamental design principle: every output can be traced back through transparent reasoning to the specific data inputs and statistical relationships that generated it. This contrasts with deep learning approaches, where the internal weighting of neural network layers makes it effectively impossible to explain why a particular provider received a particular risk score.

The tool works by combining three categories of anomaly detection across all providers in CMS’s Limited Data Sets:

Anomaly Category	What It Measures	Why It Matters for Fraud Detection
Behavioral	Billing patterns, service frequency, procedure code usage relative to specialty norms	Identifies providers whose clinical behavior deviates significantly from peers serving similar patient populations
Network	Referral relationships, shared patient clusters, geographic service patterns	Detects coordinated billing schemes involving multiple providers or facilities
Financial	Total billing volume, cost per beneficiary, reimbursement rate patterns	Flags providers billing at volumes or rates inconsistent with practice size and specialty

These three anomaly dimensions are synthesized into a single composite risk score for each provider. Crucially, the risk score can be “deconstructed to show exactly why a provider was flagged,” according to Milliman data scientist Adam Hearn. Investigators reviewing a flagged provider can see, for example, that the flag resulted from a combination of DME billing volume three standard deviations above specialty peers, an unusual concentration of referrals from a small number of ordering physicians, and per-beneficiary costs 2.5 times the regional average.

This decomposition serves two practical functions. First, it allows CMS to prioritize investigative resources on providers with both high billing volumes and statistically anomalous patterns. As Hearn noted, “CMS can focus on the providers that are billing thousands or millions of dollars, so when CMS deploys their investigative resources, they’re on the providers that have the biggest financial threat.” Second, it produces documentation that can support administrative actions, payment suspensions, or referrals for criminal prosecution, because the reasoning chain is auditable.

Milliman’s expertise in this space predates the competition. The firm’s Payment Integrity product, a multifactor, rules-based claims auditing solution, has been operating in the health plan market for years. In a partnership with Mastercard Healthcare Solutions, Milliman applied the AI Express development process to its fraud detection capabilities and identified more than $230 million in recoverable claims paid, three times the results of its legacy detection system on the same dataset, while flagging 2,700 high-risk providers for further investigation.

The False Positive Problem: Why Explainability Outperforms Accuracy Alone

The trade press framed Milliman’s win as a vendor achievement story. The more consequential signal is what the explainability requirement reveals about the operational cost of false positives in Medicare fraud detection.

Black-box models, particularly deep learning systems trained on claims data, excel at pattern recognition. They can identify billing anomalies with high sensitivity. The problem is specificity. A deep learning model trained to flag high-cost providers will inevitably flag oncologists treating advanced cancers, hospice providers managing end-of-life care for complex patients, and specialists in underserved areas who receive referral volumes disproportionate to their practice size. These providers are not committing fraud; they are managing sicker, more expensive patient populations.

Each false positive generates downstream costs: investigative staff hours, provider dispute processes, potential payment suspensions that disrupt care delivery, and provider relations damage that can affect network adequacy. When CMS’s Center for Program Integrity runs approximately 250 models per day, as reported by agency officials, the cumulative investigative burden of even a modest false positive rate becomes substantial. A system that correctly identifies 95% of fraudulent providers but incorrectly flags 10% of legitimate providers will overwhelm investigative capacity.

Milliman’s glass-box approach addresses this by accounting for clinical context within the risk scoring methodology. Because the algorithm is deterministic and statistically grounded, it can normalize for patient acuity, specialty-specific billing distributions, and regional cost variations before generating a risk score. An oncologist billing $4 million annually for chemotherapy infusions in a high-cancer-prevalence region may score differently than a DME supplier billing $4 million for catheter kits with no established clinical justification, even though both providers have similar total billing volumes.

This distinction matters enormously for health plan actuaries quantifying payment integrity savings. A fraud detection system with a high false positive rate will overstate recoverable dollars in initial projections but underdeliver in actual recoveries, because investigations of legitimately high-cost providers do not yield recoupment. Explainable AI models that reduce false positives produce more reliable estimates of actual FWA exposure, improving the accuracy of rate filing assumptions and payment integrity program ROI calculations.

The Federal Enforcement Expansion: From FDOC to CRUSH

Milliman’s win sits within a broader federal enforcement escalation that health plan actuaries need to understand because it affects payment environments, compliance expectations, and program integrity economics across Medicare and Medicaid.

Fraud Defense Operations Center (FDOC). CMS launched the FDOC in March 2025 as a permanent, real-time fraud detection unit. The center represents a philosophical shift from “pay and chase,” where CMS paid claims and then attempted to recover improper payments after the fact, to “prevent and detect,” where AI-powered analysis flags suspicious claims before payment. Kim Brandt, CMS Deputy Administrator and Chief Operating Officer, reported that the FDOC has saved over $2 billion in improper payments since its launch. The center identified $2.6 billion in Medicare overpayments across 3,262 providers in 2025.

The FDOC’s AI capabilities operate on a real-time basis, analyzing claims as they arrive rather than in retrospective batch processing. Brandt described the system as “a Netflix-type algorithm” that labels applicants as potentially high-risk based on pattern matching against known fraud indicators. However, she emphasized the human oversight requirement: “AI is great to help us say, ‘Hey, here are areas you want to focus on,’ but then we need to actually validate that.”

The impact has been measurable in specific categories. CMS achieved a 99% decrease in skin and tissue substitute billing after AI detection identified “impossibly high levels” of claims, a result that demonstrates both the scale of fraudulent billing in targeted categories and the effectiveness of real-time intervention.

WISeR Model. CMS launched the Wasteful and Inappropriate Service Reduction (WISeR) model to test whether enhanced technologies, including AI, can expedite prior authorization for items and services particularly vulnerable to FWA. The initial focus covers skin and tissue substitutes, electrical nerve stimulator implants, and knee arthroscopy for knee osteoarthritis, all categories with documented patterns of inappropriate billing.

CRUSH Initiative. In February 2026, CMS issued a Request for Information under its Comprehensive Regulations to Uncover Suspicious Healthcare (CRUSH) initiative, soliciting stakeholder feedback on potential regulatory changes to strengthen fraud prevention. The RFI covers enhanced identity verification for Medicare-enrolled entities, strengthened Medicare Advantage preclusion lists to prevent revoked providers from billing MA plans, laboratory testing safeguards for genetic and molecular tests experiencing elevated spending, AI-assisted coding oversight for medical record reviews, reduced filing deadlines for high-risk items, and expanded beneficiary contact restrictions.

State-Level Enforcement. The federal enforcement push is cascading to states. CMS deferred approximately $259.5 million in federal Medicaid matching funds from Minnesota over alleged program integrity failures and threatened to withhold over $1 billion if vulnerabilities persisted. The agency directed freezes on enrollment for 13 categories of Medicaid providers. In Texas, Governor Abbott directed increased Medicaid fraud enforcement with a focus on fully staffed Special Investigations Units. The House Committee on Energy and Commerce sent investigative letters to governors in ten states requesting detailed Medicaid program integrity information.

DOJ Coordination. The Department of Justice’s 2025 National Health Care Fraud Takedown charged 324 defendants, including 96 licensed medical professionals, across 50 federal districts. The $14.6 billion in alleged fraud more than doubled the prior record. Notably, the DOJ announced a Health Care Fraud Data Fusion Center to “bring together experts to leverage cloud computing, artificial intelligence, and advanced analytics to identify emerging health care fraud schemes,” signaling permanent investment in technology-driven enforcement.

Integrating Explainable AI Into Health Plan Payment Integrity Programs

For health plan actuaries involved in payment integrity, the CMS competition result provides a practical framework for AI procurement and program design. Patterns we’ve seen in recent payment integrity program reviews suggest that most health plans are still operating with rules-based systems supplemented by ad hoc analytics, not the integrated AI-driven approaches that CMS is now endorsing.

The integration pathway involves several concrete steps:

1. Align vendor procurement with CMS evaluation criteria. When evaluating fraud detection vendors, require that proposed solutions demonstrate explainability at the individual-provider level. The CMS competition criteria provide a ready-made evaluation rubric: can the system produce transparent reasoning accessible to program integrity teams? Can investigators audit the logic chain behind each flag? If a vendor cannot demonstrate this for a Medicare claims dataset, the solution is unlikely to satisfy the compliance direction CMS is setting.

2. Map AI risk scores to existing SIU triage workflows. Explainable AI outputs should integrate with Special Investigation Unit (SIU) case management systems, not replace them. The glass-box approach works because it produces structured evidence, behavioral anomalies, network patterns, financial outliers, that investigators can use to prioritize cases, build investigation files, and document findings. The AI score becomes the triage layer; the SIU team remains the decision layer.

3. Build actuarial validation protocols for fraud model outputs. Health plan actuaries should treat AI fraud detection models with the same rigor applied to pricing or reserving models. This means testing for stability across time periods, validating that flagged providers genuinely exhibit anomalous patterns (not just high costs), and monitoring false positive rates against recovery yields. ASOP No. 56 (Modeling) provides the professional standards framework for this validation work.

4. Quantify payment integrity savings for rate filings. Health plans incorporating AI-driven payment integrity programs need defensible estimates of fraud savings for rate filing support. Explainable AI makes this quantification more reliable because the transparent scoring methodology produces measurable outputs: number of providers flagged, investigation outcomes, dollars recovered, and false positive rates. These metrics can be translated into PMPM payment integrity credits in pricing models, subject to appropriate credibility standards.

Integration Step	Actuarial Role	Key Metric
Vendor evaluation	Define explainability requirements in procurement specs	Provider-level reasoning chain available (yes/no)
SIU integration	Calibrate risk score thresholds for case referral	Case referral volume, investigation close rate
Model validation	Test stability, false positive rates, recovery yields	Positive predictive value, recovery-to-flag ratio
Rate filing support	Translate AI outputs into PMPM payment integrity credits	Credibility-weighted PMPM savings estimate
CRUSH compliance	Align program documentation with anticipated regulatory requirements	Audit trail completeness, response time capability

5. Prepare for CRUSH regulatory requirements. The February 2026 CRUSH RFI signals that CMS is developing formal regulations around fraud prevention capabilities. Health plans that proactively adopt explainable AI for payment integrity will be better positioned when proposed rules emerge. The regulatory direction is clear: CMS wants program integrity systems that produce transparent, auditable results. Plans relying solely on opaque vendor models may face compliance gaps.

Why This Matters for Health Actuaries

The CMS competition result carries three specific implications for health actuaries.

Explainability is becoming a regulatory standard, not a preference. When CMS designs a competition that explicitly requires explainable AI, selects a glass-box model over black-box alternatives, and simultaneously invests billions in AI-powered fraud detection infrastructure, the direction is unmistakable. Health plan actuaries involved in vendor selection, model governance, or compliance should expect that future CRUSH regulations will formalize explainability requirements for payment integrity technology.

Payment integrity economics are shifting from retrospective to prospective. The FDOC’s “prevent and detect” model, which has saved $2 billion since March 2025, changes the financial dynamics of fraud exposure. Plans that invest in real-time, explainable fraud detection can reduce claim payments before they become recoverable overpayments, improving cash flow and reducing the uncertainty around FWA reserve estimates. This shifts the actuarial question from “how much can we recover?” to “how much can we prevent?” and the answer is increasingly quantifiable.

The FWA savings estimate in rate filings needs better support. Milliman’s demonstration that explainable AI can deliver measurable, auditable fraud detection results, combined with the $230 million in recoverable claims identified through its Mastercard partnership, provides a benchmark for what AI-driven payment integrity can deliver. Health plan actuaries incorporating FWA savings into pricing assumptions should document the methodology behind those estimates with the same transparency that CMS is demanding from the AI tools themselves. Conservative estimates supported by explainable methodology will hold up better with regulators than aggressive projections from opaque models.

The broader pattern across CMS actions in 2025 and 2026, from the FDOC launch to the Chili Cook-Off to the CRUSH RFI to the WISeR model, is a federal agency committing to technology-driven program integrity with explainability as the non-negotiable design requirement. Health plan actuaries who understand this trajectory can shape their organizations’ payment integrity investments accordingly.