CMS Star Ratings Shift to 65% Clinical Weight in 2027: Score Forecasting, Depression Screening, and the Actuarial Modeling Challenge

From modeling star ratings distributions across the top 25 MA plans for the past three years, the shift to roughly 65% clinical weight will likely compress the gap between 4-star and 5-star plans, making the new Depression Screening and Follow-Up measure a potential tiebreaker for bonus eligibility. The CMS CY 2027 final rule (CMS-4207-F), published April 2, 2026, and effective June 1, removes 11 measures that most plans already aced. The administrative measures being dropped had average scores above 94% across the industry. When measures with near-universal high performance disappear from the denominator, the remaining clinical and patient experience measures carry proportionally more weight in the overall calculation. For the 40% of MA-PD contracts currently hovering between 3.5 and 4.0 stars, understanding exactly which measures now drive scoring is not an academic exercise; it determines whether a contract qualifies for the 5% quality bonus that can be worth tens of millions in annual revenue.

~65%

Estimated share of Star Rating driven by clinical outcomes and patient experience after measure removals

Administrative and process measures removed from scoring, phased 2028-2029

Scored components in the new Depression Screening measure: screening rate and 30-day follow-up rate

How Clinical Weight Reaches 65%

Before the CY 2027 changes, MA-PD contracts were evaluated on up to 43 Star Rating measures spanning clinical outcomes, patient experience (CAHPS surveys), pharmacy quality, and administrative processes. The administrative and process measures, including call center availability, appeals timeliness, complaints, and member retention, collectively represented roughly 25 to 30% of the overall score weight depending on contract type. Clinical outcome measures (primarily HEDIS clinical quality measures) and patient experience measures each carried roughly 25 to 30%.

Removing 11 measures that fall predominantly in the administrative and process category mechanically redistributes their weight across the remaining measures. CMS did not change the weighting methodology itself; the Star Rating still uses a clustering algorithm that groups plan performance into 1-through-5 star categories for each measure, then applies measure-level weights (1x for process, 1.5x for intermediate outcomes, 3x for outcomes, and 1.5x for patient experience) before computing the overall score. But when the pool of 1x-weighted process measures shrinks substantially, the remaining higher-weighted clinical and patient experience measures constitute a larger share of the total.

The math works out roughly as follows. Of the 11 removed measures, most carried a 1x process weight. Removing them eliminates approximately 11 weight units from the denominator. Adding the Depression Screening and Follow-Up measure (likely weighted at 1x or 1.5x as a new process/intermediate outcome hybrid) partially offsets this reduction. CMS also added three other clinical measures for the 2027 Star Ratings: Care for Older Adults Functional Status Assessment, Concurrent Use of Opioids and Benzodiazepines, and Polypharmacy: Use of Multiple Anticholinergic Medications in Older Adults. Each carries a weight of 1. The net effect is that the total weight denominator drops, and the share attributable to measures weighted at 1.5x or 3x (clinical outcomes and patient experience) increases from approximately 50% to roughly 65%.

Post-Removal Measure Category Weights (Estimated for MA-PD)

Measure Category	Pre-Removal Share	Post-Removal Share	Direction
Clinical Outcomes (HEDIS, 3x weight)	~25-28%	~30-35%	Gains most
Patient Experience (CAHPS, 1.5x weight)	~25-28%	~28-32%	Moderate gain
Pharmacy Quality	~12-15%	~15-18%	Moderate gain
Administrative/Process (1x weight)	~25-30%	~12-18%	Largest reduction

The combined clinical outcomes and patient experience share of approximately 65% represents a structural shift in what drives plan revenue. Plans that invested heavily in call center operations, appeals processing speed, and complaint reduction were earning star-level credit for activities that no longer contribute to the score. Plans that invested in HEDIS measure performance, provider quality networks, and member experience will find those investments carrying more scoring power per dollar spent.

Which Remaining Measures Gain the Most Relative Importance

Not all clinical measures benefit equally from the weight redistribution. Under CMS's tiered weighting system, outcome measures at 3x weight gain disproportionately when lower-weighted process measures are removed. The measures that plan actuaries should watch most closely fall into three tiers.

Tier 1: Triple-Weighted Outcomes (Highest Marginal Impact)

The following Part C outcome measures carry 3x weight and will exert the greatest influence on overall star ratings under the compressed measure set:

Controlling Blood Pressure: Percentage of members 18 to 85 with hypertension whose blood pressure was adequately controlled. This measure has shown meaningful variation across plans (unlike the removed measures), making it a genuine differentiator.
Diabetes Care: HbA1c Poor Control (>9.0%): An inverse measure where lower rates indicate better performance. Plans with strong endocrinology networks and care management programs score well; plans without them face a 3x penalty.
Plan All-Cause Readmissions: 30-day hospital readmission rate, risk-adjusted. This measure ties directly to care coordination quality and has become a proxy for integrated care model effectiveness.

Tier 2: Intermediate Outcome and Patient Experience Measures (1.5x Weight)

CAHPS patient experience measures carry 1.5x weight and collectively represent the second-largest scoring block. The key CAHPS composites include Getting Needed Care, Getting Appointments and Care Quickly, and Rating of Health Plan. These survey-based measures are inherently noisy at the contract level, with confidence intervals wide enough that a plan's score can shift half a star between measurement years purely from sampling variation. Under the new framework, that noise carries more weight in the overall score, which increases forecasting uncertainty for actuaries modeling multi-year star ratings trajectories.

Tier 3: New Clinical Measures (1x Weight but Fresh Scoring Variance)

The four new measures added for 2027 and 2029 measurement years carry 1x weight individually, but their introduction creates fresh variance in the scoring distribution. Plans have no historical performance data on these measures, which means the clustering algorithm will produce star-level assignments that may not correlate with historical performance on the removed measures. For score forecasting purposes, these measures introduce the most uncertainty per unit of weight.

Depression Screening: The Unknown Variable in 2029 Star Ratings

Of all the changes in the CY 2027 final rule, the new Depression Screening and Follow-Up measure presents the most complex modeling challenge for plan actuaries. The measure tracks two distinct rates that CMS will average to determine the star rating:

Screening Rate: The percentage of eligible plan members aged 12 and older who were screened for clinical depression using a standardized instrument (typically the PHQ-2 or PHQ-9 Patient Health Questionnaire) during the measurement year.
Follow-Up Rate: The percentage of members who screened positive and received documented follow-up care within 30 days of the positive screen. Follow-up care includes a referral to a behavioral health provider, a documented treatment plan, pharmacotherapy, or additional evaluation.

The two-rate averaging creates a mathematical dynamic where plans cannot score well on screening volume alone. A plan that screens 95% of eligible members but achieves only 55% follow-up within 30 days produces an average score of 75%, which will likely fall well below the threshold for 4 or 5 stars once the clustering algorithm groups plan performance. The follow-up rate is the harder component to control because it depends on behavioral health provider availability, appointment scheduling capacity, and member engagement in a clinical area with historically high no-show rates.

Why Data Scarcity Makes This Measure Hard to Forecast

From tracking quality measure data across major MA plans, the depression screening measure presents a specific data problem: most plans do not currently report screening and follow-up rates in a format that maps directly to the CMS measure specification. The eCQI measure (CMS002v13) aligns with USPSTF depression screening recommendations, but MA plans have historically treated depression screening as a primary care workflow item rather than a quality measure that affects plan-level star ratings.

The data gaps break down into three categories:

Screening penetration baseline. Plans with integrated behavioral health models, particularly staff-model HMOs and plans with embedded psychiatry or psychology services, likely screen at rates above 80%. Plans relying on independent practice associations and loosely managed PPO networks may screen at rates below 50%, with wide geographic variation. The problem is that most plans do not currently aggregate this data at the contract level because there has been no star ratings incentive to do so.

Follow-up documentation. Even when members receive follow-up care after a positive depression screen, the documentation may not meet CMS measure specifications. A primary care physician who discusses antidepressant options with a patient may not code the encounter in a way that satisfies the measure's follow-up definition. Plans will need to build reporting infrastructure that captures both the screening instrument used and the follow-up action taken, with sufficient clinical detail to withstand CMS audit.

30-day window compliance. The 30-day follow-up window is operationally tight, particularly for plans serving rural markets where behavioral health providers are scarce. According to HRSA data, approximately 160 million Americans live in designated Mental Health Professional Shortage Areas. For MA plans with significant rural enrollment, the 30-day window may require investing in telehealth behavioral health networks, which introduces both cost and credentialing complexity.

A Worked Example: Score Sensitivity to Depression Screening Performance

Consider a plan currently rated at 3.75 stars on a scale that, after the measure removals, tips to 65% clinical and patient experience weight. Adding a depression screening measure on which the plan scores 3 stars (mediocre screening penetration, poor follow-up compliance) pushes the weighted average down. If the plan scores 5 stars on the depression measure instead (high screening, strong follow-up), the weighted average moves up. For a plan at the 3.75-to-4.0 margin, this single measure can be the difference between qualifying for the 5% quality bonus and missing it.

At the contract level, the revenue difference between a 3.75-star and a 4.0-star rating on a 100,000-member contract with a $1,100 monthly benchmark is approximately $66 million per year in quality bonus payments. A plan-level investment of $5 to $10 million in depression screening infrastructure, provider training, and care coordination capacity generates a potential 6x to 13x return if it pushes the contract above the 4.0-star threshold. That return calculation, however, depends entirely on the plan's confidence in its score forecast, which brings us to the broader modeling challenge.

Score Forecasting Under the New Measure Set

Star ratings forecasting has always been challenging because CMS uses a clustering algorithm rather than fixed thresholds. The algorithm groups all plan scores for each measure into five clusters, meaning that a plan's star rating on any given measure depends not just on its own performance but on the distribution of performance across all plans. When the measure set changes, the distributions shift, and historical patterns become unreliable predictors of future clustering boundaries.

The 11-measure removal compounds this problem in three specific ways:

1. Loss of Score Stability Anchors

The removed administrative measures had extremely tight performance distributions. Call center availability scores averaged 94% or higher across both Part C and Part D. Appeals timeliness improved from 90% to 96% between 2015 and 2025. Complaints averaged 0.23% for MA-PD contracts. These measures functioned as scoring anchors: nearly all plans received 4 or 5 stars on them, which stabilized overall scores and reduced year-over-year volatility.

Without these anchors, overall star ratings become more sensitive to variation in the remaining measures. A plan that previously absorbed a poor year on one HEDIS measure because its administrative scores compensated now has less cushion. The implication for actuarial forecasting models is that the confidence interval around projected star ratings should widen, at least for the first two measurement cycles under the new framework.

2. New Measure Calibration Uncertainty

The Depression Screening measure and the three other new measures (Functional Status Assessment, Opioid-Benzodiazepine Concurrent Use, Polypharmacy) have no historical performance data within the star ratings system. CMS will need at least one measurement year to establish the clustering boundaries for these measures. Plan actuaries forecasting 2029 star ratings (based on the 2027 measurement year, which is the first full year under the new measure set) are working with no precedent for how the algorithm will distribute star levels on these measures.

Patterns we have seen in prior measure introductions suggest that first-year clustering boundaries tend to be wider than in subsequent years, because plan performance is more dispersed when a measure is new. This dispersion can benefit early adopters who invest in compliance ahead of the measurement year, as they may earn 5-star ratings on new measures simply by being ahead of the pack.

3. Interaction Effects Between Measure Removals and Additions

The scoring impact of removing 11 measures and adding 4 is not simply additive. The CMS clustering algorithm operates on the full measure set simultaneously, and the removal of low-variance measures combined with the introduction of high-variance new measures changes the shape of the overall score distribution. Analysis by Sevana Health based on the 2025 Star Ratings found that 63% of contracts would see no change in overall rating, 13% would gain half a star, and 24% would lose half a star under the removals alone. Press Ganey estimated approximately one-quarter of contracts could lose half a star. These estimates, however, were based on removing measures from the existing score; they do not fully account for the re-clustering effect when new measures enter the calculation.

For actuarial modeling purposes, the most defensible approach is scenario analysis. Model three cases: a base case where the contract's star rating remains unchanged, an upside case where scoring compression and strong new-measure performance push the rating up half a star, and a downside case where the loss of administrative score cushion and mediocre new-measure performance push the rating down half a star. Weight these scenarios using whatever internal quality data the plan has on its likely performance on the new measures, and use the weighted average in revenue projections.

Sequencing Clinical Investments for Maximum Score Impact

Plan actuaries advising clinical leadership on quality investment priorities should think about the new measure framework as a portfolio optimization problem. Each clinical investment has a cost, an expected score improvement, and a timeline to impact. The 2027 measurement year begins January 1, 2027, giving plans approximately seven months from the rule's June 1, 2026 effective date to launch new clinical programs that will generate measurable results within the measurement period.

Immediate Priority: Depression Screening Infrastructure (June 2026 to December 2026)

Plans that lack systematic depression screening workflows need to build them before the 2027 measurement year. The minimum viable investment includes: deploying a standardized screening instrument (PHQ-2 as an initial screen, PHQ-9 for positive screens) across all primary care encounters for members aged 12 and older; building electronic health record (EHR) reporting to capture both screening administration and follow-up actions; establishing a behavioral health referral pathway with telehealth capacity for rural markets; and training care coordinators on the 30-day follow-up documentation requirements.

Plans with existing integrated behavioral health models have a structural advantage. Kaiser Permanente's staff-model approach, for example, already embeds behavioral health screening in primary care workflows. Plans that contract with independent physician associations will need to negotiate quality reporting requirements into provider contracts, which adds both time and cost to the implementation.

High-ROI HEDIS Investments: Blood Pressure and Diabetes Management

The triple-weighted outcome measures, particularly Controlling Blood Pressure and Diabetes Care HbA1c Poor Control, offer the highest per-unit return on quality investment because each star-level improvement on these measures moves the overall score three times as much as a process measure improvement. Plans that can improve their blood pressure control rate by 5 percentage points through targeted care management, home blood pressure monitoring programs, and pharmacist-led medication management may see a full star-level improvement on this measure, with a 3x weighted impact on the overall score.

CAHPS Stabilization: Managing Survey Volatility

Patient experience measures are notoriously difficult to influence because they depend on subjective member perceptions and are subject to sampling noise. However, plans can reduce CAHPS-driven score volatility through two mechanisms: increasing the effective sample size by improving survey response rates, and addressing the operational drivers of poor experience scores (hold times, prior authorization delays, provider directory accuracy). These investments do not guarantee score improvements, but they reduce the probability of downside CAHPS surprises that could offset gains from clinical measures.

The Bonus Tiebreaker: Why Behavioral Health Determines Financial Viability

The concentration of MA contracts near the 4.0-star threshold is what makes the depression screening measure financially consequential. The 2026 Star Ratings showed an average overall rating of 3.66, barely below the bonus threshold. Approximately 40% of MA-PD contracts earned 4.0 stars or above, meaning the remaining 60% sit below the bonus line. Within that 60%, a significant portion clusters in the 3.5 to 3.99 range, where a half-star improvement would trigger bonus eligibility.

Under the pre-removal measure set, these borderline contracts had multiple pathways to cross the 4.0 threshold: improve clinical measures, improve administrative measures, or improve patient experience. The measure removals eliminate the administrative pathway. The remaining options are clinical performance and patient experience. Since CAHPS is volatile and difficult to influence in the short term, clinical measures become the most reliable lever for plans seeking to cross the bonus line.

Within the clinical measure set, the new Depression Screening measure is unique because it introduces a measure on which no plan has an established track record. Plans that invest early and perform well on this measure will gain a scoring advantage precisely in the measurement years when the administrative measure cushion disappears. Plans that delay or underinvest in depression screening face a scoring disadvantage on a measure that, combined with other clinical measures, constitutes the majority of the overall weight.

The financial stakes are substantial. Federal spending on MA quality bonus payments reached at least $12.7 billion in 2025, averaging $372 per enrollee (KFF). Roughly 75% of MA enrollees were in plans receiving some form of bonus payment. Since 2015, cumulative bonus spending has exceeded $87 billion. The clinical reweighting will redistribute a portion of this spending based on which plans execute the clinical transition most effectively.

Financial Modeling: Integrating the Weight Shift into Bid Strategy

For plan actuaries building 2027 bids, the clinical reweighting interacts with two other major CY 2027 policy changes. The 2.48% base rate increase lifts benchmarks for all plans, while the chart review exclusion reduces risk-adjusted revenue by an estimated 1.53% (over $7 billion industry-wide in 2027). The star ratings changes sit on top of these adjustments, selectively increasing benchmarks for plans that cross or maintain the 4.0-star threshold.

The bid strategy implications break down by current star level:

Plans at 4.5 to 5.0 stars: Already receiving the maximum quality bonus. The clinical reweighting is unlikely to change their star level, but it may change which measures drive that level. These plans should audit their performance on the measures that now carry more weight to ensure they maintain their rating without the administrative cushion.

Plans at 4.0 to 4.49 stars: Receiving the 5% bonus but potentially vulnerable to losing it if the measure removals expose previously masked clinical weaknesses. Defensive investment in the triple-weighted HEDIS measures and the new depression screening measure is warranted.

Plans at 3.5 to 3.99 stars: The primary beneficiaries of the clinical reweighting if they invest appropriately. A contract at 3.75 stars that improves from 3 to 5 stars on the Depression Screening measure and maintains performance on existing clinical measures could cross the 4.0 threshold, triggering the 5% benchmark bonus. The incremental revenue from this transition should be modeled explicitly in the bid, with probability-weighted scenarios reflecting the plan's confidence in its clinical investment payoff.

Plans below 3.5 stars: Unlikely to cross the 4.0 threshold from the clinical reweighting alone. These plans face a more fundamental challenge: their clinical performance is already below industry norms, and the increased weight on clinical measures may push their scores further from the bonus line. For these contracts, the bid strategy should focus on realistic star-level assumptions rather than aspirational ones.

What the Scoring Compression Leaves Unresolved

Our prior analysis of the $18.6B star ratings windfall detailed the fiscal impact and the structural problems the overhaul leaves unaddressed: medical trend running at 7 to 10% annually, the V28 risk adjustment compression, and benefit sustainability concerns. The clinical reweighting adds a fourth unresolved issue: the behavioral health infrastructure gap.

HRSA data indicates that approximately 160 million Americans live in Mental Health Professional Shortage Areas. For MA plans serving these populations, the Depression Screening measure creates a structural disadvantage that investment alone may not overcome in the short term. Recruiting behavioral health providers takes 12 to 18 months; building telehealth networks takes 6 to 12 months; and training primary care providers on standardized screening protocols takes 3 to 6 months. Plans that begin these investments after the rule's June 2026 effective date may not have full operational capacity until mid-2027, meaning their 2027 measurement year data will reflect partial implementation.

This timeline mismatch between policy implementation and operational readiness is a recurring pattern in CMS quality measurement. The agency announces measures with enough lead time for well-resourced national carriers to comply, but regional plans and those serving underserved communities face a steeper implementation curve. The clinical reweighting amplifies this dynamic by increasing the scoring consequences of clinical infrastructure gaps.

Why This Matters for Plan Actuaries

The shift to 65% clinical weight in the Star Ratings framework represents the most consequential scoring methodology change since CMS introduced the quality bonus payment system. For plan actuaries, the practical implications center on three areas.

First, revenue projections must incorporate wider confidence intervals around star ratings for the 2027 and 2028 measurement years. The loss of administrative score anchors and the introduction of new clinical measures with no performance history create forecasting uncertainty that should be reflected in reserve assumptions and premium deficiency testing.

Second, quality investment ROI analysis must be updated to reflect the new measure weights. Actuaries who previously modeled star ratings sensitivity to clinical measure improvements can increase the ROI multiplier on those investments by the ratio of new-to-old clinical weight (roughly 1.3x). Investments in administrative processes that no longer contribute to star ratings should be redeployed.

Third, the depression screening measure creates a new actuarial data need. Plans that have not historically collected contract-level depression screening and follow-up data need to begin doing so immediately, both to inform clinical investment decisions and to provide the baseline data necessary for actuarial modeling of the measure's impact on projected star ratings. Without this data, star ratings forecasts for the 2029 Stars (2027 measurement year) will have unacceptably wide confidence intervals.

The plans that navigate this transition most effectively will be those that treat the clinical reweighting not as a one-time scoring adjustment but as a permanent shift in how CMS defines plan quality. The administrative measures are not coming back. The future of MA plan viability runs through clinical outcomes, patient experience, and behavioral health.