On May 28, 2026, Liberty Mutual became the first major US carrier to embed its own rating engine inside a foundation model platform, making personal auto insurance quotes available through natural language conversation in Arizona, Kentucky, Ohio, Missouri, New Mexico, Utah, and Wisconsin. The company plans to expand to more than 40 states by year-end. Coverage in trade press focused on the distribution angle: a Fortune 100 carrier with $50.5 billion in consolidated 2025 revenue meeting consumers on a platform with hundreds of millions of weekly active users, bypassing comparison sites and the traditional web form altogether. The actuarial story is more specific and more consequential. Liberty Mutual’s rating engine was calibrated against structured inputs. It is now receiving information collected as free-text conversation. Those are not the same thing, and the gap between them is where rate adequacy risk enters the transaction.

Unlike aggregator applications that present estimated price ranges and redirect consumers to multiple carrier sites, the Liberty Mutual ChatGPT app routes the quote through the company’s own rating algorithm to produce a single, personalized premium. After answering questions conversationally, users are directed to LibertyMutual.com to finalize coverage and bind. Tyler Asher, Chief Distribution and Marketing Officer for US Retail Markets, described the goal as meeting consumers wherever they prefer to shop: “Consumers have more choice than ever in how they choose to shop, including emerging channels like conversational AI and large language model platforms. It’s beneficial to customers that we meet them wherever they are.” The actuarial question is not about distribution preference. It is about whether the inputs that arrive from a natural language conversation are equivalent, in precision and completeness, to the structured inputs the rating engine was designed to receive.

What a Personal Auto Rating Engine Expects

Personal auto rating algorithms are not designed to handle ambiguity at input. Each field has a specific format, a defined lookup table, and actuarial support built from years of loss cost analysis at a precise level of granularity. The garaging ZIP code determines the territorial factor, the primary geographic pricing variable in most state rating plans. Territorial factors in a metropolitan area like greater Phoenix, which spans more than 30 five-digit ZIP codes, can vary by 25 to 40 percent between the most favorable and least favorable segments. A garaging ZIP is not a city name or a neighborhood description. It is a five-digit postal code that resolves to a single factor in a filed table. Resolving a consumer’s description of their neighborhood to that specific five-digit code is a translation problem that the rating engine itself was never designed to perform.

The vehicle identification number performs a comparable function for physical damage rating. A 17-character VIN decodes to the vehicle year, make, model, series, and body style, and that decode determines the vehicle symbol, the index variable that anchors comprehensive and collision base rates across the filing. A 2019 Toyota Camry has multiple trim levels: the LE, SE, SE Nightshade, XLE, XSE, and TRD each receive a distinct symbol assignment. Symbol-to-symbol variation in collision base rates within a single model year and nameplate typically ranges from 10 to 25 percent, depending on the carrier’s vehicle symbol schedule and loss experience. A consumer who describes their vehicle as “the sporty Camry from a few years ago” is providing information that requires disambiguation before any rated variable can be populated accurately.

Driver date of birth carries the same precision requirement. Age factors in personal auto programs apply in single-year increments for young drivers and two- to five-year bands for older drivers on most filing plans. “Mid-thirties” produces a different premium than “34 years old” when the rate plan uses single-year factors in that age range. Prior continuous insurance status, which affects the rate in most states through a continuity credit or surcharge, similarly requires specific answers: carrier name, effective dates, coverage type. Prior claims history requires a count of incidents, incident dates, and coverage types involved. Each of these variables was described in the actuarial memorandum supporting Liberty Mutual’s state rate filing as a defined, structured field. The conversational interface collects them as natural language responses.

This is the structured input assumption that underlies every personal auto rate filing. It is not a product of one carrier’s design choices. It is the consequence of how territorial rating, vehicle symbol schedules, and driver classification factors are built and validated. The actuarial support documentation for each factor was developed against data collected at that level of precision, and the filed rates are defensible in state rate review proceedings because that precision was maintained throughout the data pipeline. A conversational front-end that introduces a parsing layer between consumer statement and rated variable modifies that pipeline in a way the rate filing does not describe.

The Interpretation Gap Between Conversation and Structured Fields

The intermediate layer that transforms conversational input into structured rating fields is not part of any state rate filing in which Liberty Mutual operates. It does not appear in the actuarial memoranda supporting the territorial factor or vehicle symbol exhibits. Its performance, specifically the accuracy with which it translates a consumer’s statement like “I live in the west suburbs of Cleveland” into a specific five-digit garaging ZIP, is not validated against the actuarial standard that governs the rating algorithm itself.

Rate-filing reviews for auto carriers expanding into non-traditional distribution channels show a recurring pattern: the rating algorithm performs as calibrated against structured test cases, but loss-ratio divergence emerges in live production when input validation loosens at the point of sale. ASOP No. 56, Modeling, effective October 2020, requires actuaries responsible for a predictive model to document the model’s intended purpose, key assumptions, and limitations material to its application. Applied to an input-interpretation layer in a conversational quoting system, that standard demands documentation of how the model handles ambiguous geography, incomplete vehicle descriptions, and self-reported driver information that contradicts motor vehicle record sources. Those are the cases where the gap between what the consumer said and what the rating engine received is largest. They are also the cases most likely to produce an underpriced policy.

The field interpretation problem has precedent in adjacent insurance technology contexts. Claims intake systems that accept free-text description of loss events, rather than structured dropdown selection, show consistent classification variance in the range of five to fifteen percent across similarly described losses. Rating variable collection from free-text commercial applications has produced comparable results: without server-side validation at the field level, ZIP code capture error rates alone are sufficient to shift a risk from one territorial tier to another in densely populated metropolitan markets where territorial factor boundaries do not follow intuitive geographic lines. Liberty Mutual’s conversational interface collects the same variables under the same actuarial constraints, without the field-level validation that a traditional web form applies before the data reaches the rating engine.

The seven states selected for the initial rollout are not the highest-complexity markets in Liberty Mutual’s portfolio. California, New York, New Jersey, Massachusetts, and Florida carry the most heterogeneous territorial structures and the most demanding rate filing review requirements. Arizona, Kentucky, Ohio, Missouri, New Mexico, Utah, and Wisconsin represent a pilot-market configuration where territorial factor variance is moderate and the regulatory review posture for new distribution programs is less intensive. The 40-state expansion by year-end implies that the higher-complexity markets enter the conversational channel before the input-interpretation model’s actuarial performance can be validated against a full policy year of loss experience from the initial seven states.

Adverse Selection Through the Conversational Channel

Consumers who choose a conversational quoting channel over a traditional web form are not a random sample of the auto insurance market. Channel preference is a behavioral signal, and behavioral signals are correlated with risk profiles in personal lines insurance. The direction and magnitude of that correlation for conversational quoting on a foundation model platform is unknown, because no carrier has operated a production-scale program in this format long enough to accumulate credible experience data. The distribution of plausible outcomes shapes the rate adequacy planning problem regardless.

One selection hypothesis: consumers who find traditional web form friction prohibitive, because of lower digital engagement, cognitive load associated with structured disclosure, or discomfort with explicit claims history questions, may find the conversational interface more accessible. Research on digital form completion in consumer financial services shows that form-abandonment rates are higher among older consumers, those with more complex household profiles, and those with adverse driving history who anticipate a surcharge and defer by not completing the form. If the conversational channel reduces friction for this segment, it may attract a disproportionate share of higher-risk applicants who previously abandoned Liberty Mutual’s web quote flow before receiving a rated premium. The channel makes the market larger by drawing in applicants who could not or would not navigate the form. Those applicants may not be the same risk as those who did.

The inverse hypothesis is also plausible: highly engaged ChatGPT users, who tend to skew toward younger and more digitally active consumer segments, may self-select into the conversational channel with below-average risk profiles. Younger drivers with clean records and newer vehicles, the segment that benefits most from direct-channel pricing relative to captive agent placement, may account for a larger share of ChatGPT-originating policies than of the existing web-direct book. That composition would produce better loss experience rather than worse.

Either hypothesis may prove accurate, or neither may materialize at meaningful scale. The actuarial problem is the timeline. Loss development sufficient to evaluate channel-level adverse selection requires 12 to 24 months from policy inception. Liberty Mutual launched in late May 2026 and plans to reach 40-plus states before December. A structural adverse selection effect, if it exists, is embedded in policies written at expanding scale before the first policy year’s development is available. A carrier pricing a new distribution channel without channel-specific experience data is pricing on assumptions that may not hold. That is not unusual in insurance, where new programs routinely outpace their own actuarial data. It requires an explicit acknowledgment in the rate adequacy monitoring plan, and a monitoring calendar calibrated to the expansion pace, not to the standard two-year retrospective.

Rate Filing Definitions and the Input Format Question

State rate filings are formal regulatory documents that describe the rating plan, its variables, their definitions, and the actuarial basis for each factor. When Liberty Mutual filed its personal auto rating plan in Arizona, the filing described garaging ZIP code as a defined rating variable with a specific territorial factor exhibit. That description assumes a five-digit postal code as the input format. It does not describe garaging ZIP code as the geocoded output of a large language model’s interpretation of a consumer’s description of their residence location. Those are different representations of the same underlying variable, and the distinction has regulatory implications.

Whether introducing a conversational collection layer constitutes a material change to a filed rating plan, and therefore requires regulatory notification or supplemental approval, depends on state-specific interpretation of form and rate filing requirements. Most states require prior approval or at minimum notification for material changes to rating algorithms and their input definitions. The question of whether an input-parsing step qualifies as part of the rating algorithm has not been addressed by state insurance departments in published guidance for this specific technology configuration. It is a live regulatory question, not a resolved one. Liberty Mutual is operating in seven states while that question goes unanswered by regulators.

The NAIC Model Bulletin on the Use of Artificial Intelligence Systems by Insurers, adopted in December 2023 and implemented by 24 states as of mid-2026, requires insurers to maintain a documented AI System Program covering AI systems used in regulated insurance functions, including rating. The bulletin asks that the program include validation and testing to assess the quality and integrity of data used in AI system inputs. If Liberty Mutual’s input-interpretation model qualifies as an AI system participating in the rating decision, the program documentation must cover it, and the validation, bias analysis, and data quality requirements in the bulletin extend to the interpretation layer alongside the rating algorithm itself. That is a materially different documentation obligation than the actuarial filing support the company’s rate filings currently address.

ASOP No. 23, Data Quality, requires actuaries to assess the appropriateness of data used in actuarial work and to disclose material data limitations. The actuarial memorandum supporting a rate filing implicitly certifies that the rating variables were collected with sufficient accuracy to support the factor exhibits. A conversational collection layer that introduces measurement error in the garaging ZIP variable affects the territorial factor’s actuarial accuracy in ways not addressed in the original memorandum. Documenting that the interpretation model’s error rate is uncorrelated with loss experience requires the validation analysis that ASOP No. 23 and ASOP No. 56 together would expect from any model that feeds a rated variable. That documentation does not exist in the current filing record for these seven states.

The Audit Record for Conversational Quote Transactions

Traditional web-form quoting generates a structured audit record: field entry timestamps, validation error sequences, override events, and the final structured payload submitted to the rating engine. That record captures the exact inputs the consumer provided in the format the rating algorithm received them. Regulatory complaint reviews, coverage disputes, and market conduct examinations that access specific quote records depend on this documentation structure. State insurance departments that investigate consumer complaints about premium accuracy expect to receive a record that shows precisely what information was provided and how the rating algorithm applied it.

A conversational quoting transaction generates a different record. The chat transcript captures what the consumer said. The structured object passed to the rating engine captures what the interpretation model derived from that conversation. Those two records are not identical, and the gap between them is the point of potential actuarial and legal exposure. If a consumer states in conversation that their vehicle is garaged in Columbus, Ohio, and the interpretation model assigns a ZIP code in suburban Dublin rather than a higher-territorial downtown Columbus ZIP, the transcript and the rating record diverge. The premium was lower than the filed rate plan required for the actual garaging location. The documentation trail must make that derivation reconstructable by state regulators on demand.

Regulatory requirements for record retention in auto insurance rate applications vary by state, but most state market conduct standards require that quote records be retained and accessible for examination. Satisfying that standard for a web form transaction requires server-side logs of each field entry and the final structured payload. Satisfying it for a conversational transaction requires the full chat transcript, the parsed output at each field-level interpretation step, any disambiguation decisions made by the model, and the final structured payload, linked through a single transaction identifier. Whether Liberty Mutual’s current implementation captures and retains this complete chain in a format accessible to state regulators under market conduct exam requests is not addressed in the launch announcement. It should be.

Privacy obligations add a further layer. Disclosures accompanying the Liberty Mutual ChatGPT app state that information provided in the chat could be shared with Liberty Mutual, its affiliates, and service providers. For most consumers, the implications of that sharing, in a context where the chat content includes specific vehicle, address, and driving history information, are considerably less clear than the implications of submitting the same information through a structured web form with an explicit data handling disclosure. State data privacy statutes and insurance information privacy regulations apply to the conversational collection context. The combination of insurance-regulated personal information and a third-party platform data flow creates a disclosure and compliance requirement that the launch materials address only in outline.

The Distribution Stack That Follows

Liberty Mutual’s ChatGPT launch is a first-mover claim, but the architecture is replicable. Google AI, Meta AI, and Apple Intelligence provide distribution surfaces with audiences comparable to or larger than ChatGPT’s. If conversational quoting on ChatGPT produces measurable new business volume, the expansion logic points toward an application on each major foundation model platform, with each platform’s consumer base becoming a distribution channel that the carrier accesses through an API but does not own. The analogy to the comparison site relationship of the 2010s is instructive: carriers gained volume through aggregator platforms and then found themselves competing on price in an environment they did not control. Foundation model platforms as distribution infrastructure raises a structurally similar dependency question at a different level of the stack, with the added complication that the AI platform controls not just comparison positioning but the entire product discovery experience.

eMarketer’s analysis of the Liberty Mutual launch identified the core commercial tension precisely: the carrier controls the rating transaction, but the platform controls how the consumer discovers the product. Consumers find the Liberty Mutual ChatGPT app because ChatGPT surfaces it, and they may receive suggestions to compare other carriers through the same conversational interface. The carrier’s competitive advantage through this channel depends on platform placement dynamics it cannot dictate. That dependency is not specific to insurance. It is the structural condition of distribution on any third-party foundation model platform. But insurance adds the layer of rated risk, rate filing compliance, and actuarial accountability that most commercial applications on these platforms do not carry.

For actuaries, the distribution stack question resolves to an underwriting guideline design problem. Underwriting guidelines describe the risks a carrier accepts through each distribution channel and set the parameters within which rate adequacy is expected to hold. A new channel that modifies input collection mechanics and attracts a potentially different risk population than the existing web-direct book belongs in the underwriting guideline framework before it scales. The guidelines for the conversational channel should specify input validation requirements, risk tolerance for interpretation uncertainty, and monitoring triggers that prompt program modification if loss experience signals adverse selection. Those guidelines do not appear in the launch announcement, and their existence or absence will determine how well the pricing integrity of the conversational book is maintained as it grows toward 40-plus states.

Actuarial Monitoring After a New Channel Opens

The actuarial governance response to a new distribution channel follows a framework that insurance pricing practice has refined over decades of direct-to-consumer, agency, and broker channel experience: stratify loss development by origination channel, compare loss ratios, claims frequency, and severity distributions against the broader book, and identify material divergence within the first policy year. The conversational channel is not exempt from this framework, but it adds monitoring dimensions that the prior generation of channel stratification did not require.

Beyond the standard channel comparison, the monitoring plan for a conversational quoting program should stratify by input ambiguity. Transaction records that required model-level disambiguation, cases where the consumer’s description was unclear enough to trigger a follow-up question or a model-assigned default value, represent the highest-risk segment for input interpretation error. Tracking loss development for high-ambiguity transactions separately from low-ambiguity transactions provides the earliest available signal about whether interpretation errors correlate with loss outcomes. A positive correlation between disambiguation frequency and adverse loss experience would indicate that the interpretation model’s errors are not random noise but are systematically concentrated in exactly the cases where the rating engine is most dependent on accurate input. That is a rate adequacy problem and a potential unfair discrimination concern simultaneously, depending on which consumer populations generate the most ambiguous inputs.

Liberty Mutual’s 2025 full-year combined ratio of 88.4 percent, improved from 95.9 percent in 2024 and described by CEO Tim Sweeney as the company’s lowest in recent history, provides some underwriting margin to absorb modest adverse development from a new channel during the experience accumulation period. Total consolidated net written premium reached $43.6 billion in 2025. A conversational channel that begins at small scale in seven states and grows toward a 40-state deployment by year-end represents an increasing fraction of new business origination, not a small-scale pilot with containable exposure. The monitoring calendar should match the expansion pace. A first formal channel comparison at six months after the initial policy cohort binds is consistent with the rate of deployment. Waiting for full two-year development before reviewing channel performance is not adequate governance for a program being deployed across more than 40 rate filings before its first claims year develops.

The broader actuarial implication extends beyond Liberty Mutual. The company is the first to operate a production-scale conversational quoting program through a carrier-owned rating engine on a foundation model platform, but it will not be the last. The governance questions this launch raises, how the rating engine’s structured input assumption interacts with free-text collection, what regulatory notification is required when the input format changes, how adverse selection through a new channel is detected before it becomes rate inadequacy, and what audit record a conversational transaction must produce, are questions the personal auto actuarial community will answer in practice before any professional guidance or regulatory bulletin addresses them formally. The answers Liberty Mutual’s program produces over the next 18 months will define the template for every carrier that follows.

Further Reading on actuary.info

Stay ahead with daily actuarial intelligence - news, analysis, and career insights delivered free.

Subscribe to Actuary Brew Browse All Insights