CAS Funds $80K to Fine-Tune LLMs for P&C Actuarial Reasoning: What the Dual RFP Means for the Profession

The CAS AI Working Group's published research priorities have traced a clear progression over the past three years: from exploring what AI can do, to defining what it should do, to now funding how to build it specifically for actuarial use. The February 2026 dual Request for Proposals, offering up to $80,000 across two research tracks, represents the first time a major actuarial organization has committed funding to adapt large language models for domain-specific P&C reasoning rather than simply evaluating off-the-shelf tools.

The first RFP, "Adapting Large Language Models for Specialized P&C Actuarial Reasoning," carries a budget of up to $80,000 and targets fine-tuning, retrieval-augmented generation, and hybrid architectures that embed actuarial logic directly into model behavior. The second, "Leveraging LLMs in Unstructured Claims Data," funded at up to $40,000, seeks reproducible frameworks for converting adjuster notes, medical records, and call transcripts into structured variables for reserving and ratemaking. Together, they address the two sides of the actuarial AI gap: reasoning that reflects professional standards, and data extraction that unlocks information actuaries currently cannot use at scale.

No publication has yet analyzed the strategic implications of a professional body investing in building its own domain-specific AI capabilities, or connected this initiative to the parallel trend of carriers and vendors training proprietary insurance models. This article provides that analysis.

Inside the Actuarial Reasoning RFP: What the CAS Is Actually Funding

The first RFP is explicit about what it is not seeking. The CAS draws a line between research that evaluates how well general-purpose LLMs handle actuarial prompts and research that investigates "the mechanisms by which LLM behavior is shaped for actuarial use." The distinction matters. Dozens of blog posts and conference presentations have tested whether ChatGPT or Claude can solve exam problems or generate loss triangles. The CAS wants something fundamentally different: models that internalize actuarial logic rather than approximating it through prompt engineering.

The eligible P&C actuarial domains span the core of casualty practice: pricing and ratemaking, reserving and loss development analysis, capital modeling and stress testing, reinsurance analysis and portfolio risk management, and emerging risks assessment. This is not a narrow research question. The CAS is inviting proposals that could reshape how any of these functions interact with AI.

The technical approaches the RFP explicitly encourages reveal the CAS's sophistication about the current state of the art. Fine-tuning existing models using parameter-efficient methods like LoRA and QLoRA is listed alongside instruction tuning, retrieval-augmented generation, training domain-specific models from scratch, hybrid systems combining LLMs with traditional actuarial models, and reinforcement learning from human feedback with actuarial input. That last approach, RLHF with actuarial practitioners providing the feedback signal, would be genuinely novel. No published research has explored what happens when the human feedback in RLHF comes specifically from credentialed actuaries evaluating model outputs against professional standards.

The deliverables reinforce the practical orientation. Researchers must produce a peer-reviewed paper, a demonstration of use cases, a system architecture description enabling reproduction, and a GitHub repository with code, data pipelines, and artifacts under the Mozilla Public License 2.0. The CAS is not funding theoretical exploration. It wants reproducible systems that other actuaries can deploy.

The evaluation criteria center on a critical requirement: performance improvements over a "clearly articulated baseline," whether that baseline is an out-of-the-box LLM, a prompt-based approach, or an existing actuarial workflow. Proposals must demonstrate meaningful behavioral change through adaptation, enhanced interpretability and validation, practical feasibility, and stability of outputs. That stability requirement addresses a known weakness of generative models in actuarial applications: the same prompt can produce materially different reserve estimates on consecutive runs, which is unacceptable for regulatory filings.

The Unstructured Claims Data Track: Unlocking Information Actuaries Cannot Currently Use

The second RFP targets a different but equally consequential problem. Actuaries have always known that adjuster notes, medical records, call transcripts, and scanned documents contain information relevant to loss development and pricing. The challenge has been extracting it at scale in a form that fits actuarial workflows.

The CAS frames this as converting unstructured claims data into "categorical variables for reserving or ratemaking or both." The specific data types listed include phone call transcripts, claim notes, images, web data such as scraped data and social media posts, and scanned documents including medical records. Each of these presents distinct extraction challenges. Medical records require clinical terminology mapping. Adjuster notes use inconsistent abbreviations across carriers. Call transcripts introduce speech-to-text error rates that compound with downstream classification errors.

This RFP has already produced results. A proof-of-concept paper published on arXiv in June 2026 by Lieberthal et al., funded by the CAS AI Working Group, demonstrates a two-stage processing architecture that extracts 36 actuarial variables from synthetic FHIR-based claims data and real claims documents. The framework, validated by two independent clinical expert reviewers, achieved mean scores above 4.0 on a five-point Likert scale across 14 core variables. More significantly for practicing actuaries, the severity-segmented analysis reduced reserve estimation error from 6.5% to 4.0%. The four-script Python pipeline includes audit trails and confidence scoring, addressing the documentation requirements under ASOP No. 56.

The practical implications extend beyond the research itself. If LLMs can reliably extract structured actuarial variables from unstructured claims data, the information asymmetry between what adjusters know and what actuaries can model narrows considerably. Loss development patterns that currently rely on transaction codes and aggregate severity trends could incorporate narrative claim characteristics: litigation involvement signals from adjuster notes, treatment trajectory indicators from medical records, or settlement posture cues from recorded calls.

Why General-Purpose LLMs Fall Short for Actuarial Work

The CAS initiative arrives at a moment when the gap between general-purpose LLM capabilities and actuarial domain requirements is becoming quantifiably apparent. Caesar Balona's "ActuaryGPT" paper in the British Actuarial Journal established that LLMs can handle natural language processing tasks and serve as workflow assistants in actuarial settings, but the paper also documented the boundaries: tasks requiring precise numerical reasoning, regulatory-compliant documentation, or reproducible quantitative outputs expose fundamental limitations of prompt-based approaches.

Mario DiCaro, FCAS, who chairs the CAS AI Working Group and serves as VP of Capital Modeling and Analytics at Tokio Marine, captured the core problem in Actuarial Review: "It turns out it's extremely good at reading. Unfortunately, when it doesn't find what it's looking for, it would make stuff up." That tendency to hallucinate is inconvenient in a chatbot. In a reserve estimate or rate filing, it creates professional liability exposure.

The hallucination problem is not merely anecdotal. Research by Roy and Singh (arXiv 2602.13213) on agentic AI for commercial insurance underwriting measured baseline hallucination rates of 11.3% in underwriting decision support. Their adversarial self-critique architecture reduced that to 3.8%, but even 3.8% is orders of magnitude too high for statutory reserve opinions or rate indications that must withstand regulatory examination. An appointed actuary who signs a Statement of Actuarial Opinion cannot qualify it by noting that the underlying model fabricates data 3.8% of the time.

The reproducibility problem compounds the hallucination issue. General-purpose LLMs are stochastic by design. Temperature settings, context window variations, and model version updates all introduce variability that conflicts with actuarial standards requiring consistent, documentable methodologies. ASOP No. 56, which governs modeling work, requires actuaries to evaluate a model's appropriateness for its intended use, assess data quality and model structure, perform validation and testing, and ensure appropriate governance. A model whose outputs shift materially between runs fails the governance test before the validation work even begins.

Domain-specific adaptation addresses these problems through architectural constraints rather than prompt engineering. Fine-tuning on actuarial corpora can reduce hallucination rates by grounding the model's probability distributions in domain-specific terminology and reasoning patterns. Retrieval-augmented generation can anchor outputs in authoritative sources like CAS papers, ASOPs, and filed rate manuals. Hybrid architectures can route numerical computation to deterministic actuarial models while using the LLM for natural language interpretation and documentation. Each of these approaches, explicitly encouraged in the CAS RFP, represents a different strategy for making LLM behavior predictable enough to meet professional standards.

The Vendor and Carrier Context: Who Else Is Building Domain-Specific Insurance AI

The CAS investment does not occur in isolation. Carriers and vendors have been building proprietary and semi-proprietary insurance AI models with far larger budgets, and the competitive dynamics between these efforts and the profession's own research program merit examination.

The most advanced published example is INS-S1, a family of scalable insurance LLMs described in a March 2026 paper (arXiv 2603.14463). INS-S1 integrates a Verifiable Insurance Data Synthesis System with fine-grained sub-task decomposition and a multi-source synthesis framework that enforces logic constraints on actuarial reasoning. The results are striking: the model achieves a 0.6% hallucination rate while significantly outperforming DeepSeek-R1 and Gemini-2.5-Pro on domain tasks. The researchers also released INSEva, a benchmark of 39,000+ insurance-specific test samples, establishing the first comprehensive evaluation framework for insurance AI.

On the carrier side, the architectures are diverse but converging on domain specificity. AIG's multi-agent system uses Palantir Foundry and Anthropic Claude with 30-hour autonomous underwriting cycles, but the agents operate within a domain-specific ontology framework that constrains their behavior to insurance logic. State Farm's OpenAI Frontier partnership deploys Navi across 19,200 agent offices, trained on the mutual's proprietary policy and claims data. Allstate's ALLIE platform codes one-third of software and cuts billing escalations 50%, built on a proprietary stack rather than a foundation model vendor relationship.

Mike McGavick, former CEO of XL Group, articulated the carrier perspective at the CAS Seminar on Reinsurance in June 2026. McGavick advocates for what he calls "domain-specific language models," or dsLMs, arguing that small, insurance-focused models address industry-specific terminology and challenges more effectively than general-purpose alternatives. His company, Mea Platform, builds insurance-specific agentic AI for workflow automation. "If I use the word model," McGavick told the audience, "it's the actuary in the end that's going to be telling the rest of us what can be trusted and what cannot."

The carrier and vendor efforts share a common characteristic that distinguishes them from the CAS initiative: they are proprietary. AIG's ontology framework, State Farm's Navi training data, Allstate's ALLIE architecture, and INS-S1's synthesis system are all controlled by the organizations that built them. The CAS is funding open research under the MPL 2.0 license, with code deposited in a public GitHub repository. This creates a fundamentally different knowledge structure: one where the profession's domain-specific AI capabilities are shared rather than siloed.

What Actuarial-Native AI Would Actually Look Like

The CAS RFP's technical specificity enables a concrete picture of what domain-adapted actuarial LLMs might produce. Consider three scenarios across the eligible domains.

Reserving and loss development analysis. A fine-tuned model trained on CAS papers, published loss development studies, and anonymized triangle data could accept a loss triangle as input and produce not just point estimates but a documented selection rationale citing relevant factors: the tail behavior of the selected development pattern, whether the data supports a Bornhuetter-Ferguson blend, how calendar-year trends in the triangle compare to industry benchmarks, and what the sensitivity of the selected ultimate looks like under alternative assumptions. The key difference from prompting a general-purpose LLM is that the fine-tuned model would draw these rationales from actuarial methodology rather than generating plausible-sounding text. The CAS's earlier $25,000 RFP from the Risk Working Group specifically sought an AI tool capable of accepting standard data input, following a particular reserving method if prompted or determining the best method based on data characteristics.

Pricing and ratemaking. A hybrid architecture could combine an LLM's ability to interpret rate filing narratives, competitor filings, and regulatory correspondence with a deterministic pricing model's numerical precision. The LLM component could extract rating factor changes from competitor filings, summarize regulatory objections from prior filing correspondence, and draft filing narratives that anticipate DOI concerns. The pricing model would handle the mathematics. This division of labor maps naturally to how actuarial teams already work: analysts handle the computational pipeline while managers draft the narratives and respond to regulators.

Reinsurance analysis. RAG-augmented models anchored in treaty language databases, catastrophe model documentation, and historical placement data could accelerate the manual-intensive process of reviewing and comparing reinsurance program structures. The model could flag inconsistencies between treaty terms and placement summaries, identify coverage gaps relative to the insurer's risk profile, and generate comparison matrices across program alternatives. The RAG architecture ensures that every output is traceable to a source document, which matters for the audit trail requirements under ASOP No. 23 (Data Quality) and ASOP No. 56.

ASOP Compliance: The Governance Layer That Makes or Breaks Domain AI

Every scenario above runs through the same regulatory bottleneck: ASOP No. 56 requires that any model used in actuarial work, including AI-assisted models, meet documentation, validation, and governance standards. The actuary who signs an opinion based partly on AI-generated analysis bears the same professional responsibility as if they had performed the analysis manually. This is not a theoretical constraint. It shapes every design decision in actuarial AI.

ASOP No. 56 requires the actuary to evaluate the model's appropriateness for its intended use, assess the quality of data and assumptions feeding the model, perform validation testing and analysis sufficient to be confident in the results, document the model's structure and any known limitations, and disclose material weaknesses. For general-purpose LLMs, meeting these requirements is difficult because the model's training data, internal reasoning, and failure modes are opaque. For domain-adapted models, compliance becomes architecturally achievable: the fine-tuning dataset is documented, the RAG corpus is auditable, the validation framework can be tested against known actuarial benchmarks, and the governance constraints are built into the system rather than bolted on afterward.

The CAS RFP explicitly requires an "interpretability and explainability framework for outputs" and "governance, appropriate use, and implementation considerations" as deliverables. This is the CAS ensuring that whatever the researchers build, it can be deployed within the existing professional standards framework rather than requiring new standards to accommodate it.

The American Academy of Actuaries reinforced this point in its 2026 professionalism guidance on generative AI: if an actuary uses GenAI tools in their work, "you bear the same professional responsibility for the output as you would for any model." Documentation of how AI was used, what validation was performed, and what limitations were identified is not optional. Domain-specific models that produce auditable, reproducible outputs aligned with actuarial methodology are inherently easier to govern than general-purpose models whose behavior varies with prompt construction.

The CAS AI Working Group: Three Years of Institutional Building

The dual RFP did not emerge from a vacuum. The CAS AI Working Group, chaired by DiCaro and comprising approximately 25 to 30 volunteers ranging from recent graduates to retirees with geographic diversity spanning Africa and Asia, has been systematically building the infrastructure for actuarial AI research.

The progression is visible in the group's outputs. The AI Primer, developed in response to demand expressed at the CAS Spring Meeting Town Hall, provides foundational AI education for practicing actuaries. The AI Fast Track Program, delivered in partnership with Akur8, offers an eight-session bootcamp covering gradient boosting, neural networks, reinforcement learning, and LLMs, earning participants a Certificate in Advanced AI for Actuarial Science and up to 9 CE credits. The AI Tools and Resources page on casact.org consolidates research, education, and the Almost Nowhere publication exploring data science and actuarial thinking in P&C insurance.

Each of these outputs built capacity for the current research program. The AI Primer established shared vocabulary. The Fast Track Program trained a cohort of actuaries who understand ML techniques well enough to evaluate research proposals. The Resources page created a distribution channel for research outputs. The dual RFP is the next step: producing the actual domain-specific tools and methodologies that trained actuaries can deploy.

The working group's ambitions extend beyond the current RFPs. DiCaro envisions "an active community where people are staying connected with each other and learning from the work done by the group" within three years. The 2026 Reserves Call Paper Program on improved methodologies and technologies for reserving, combined with the AI-assisted reserving tool initiative (a separate funded project seeking to build an AI tool that helps reserving actuaries at every stage from methodology selection to peer review documentation), signals that the CAS sees domain-specific AI as a permanent research priority rather than a one-off investment.

Implications for the CAS Syllabus and Continuing Education

If the funded research produces viable domain-specific tools, the downstream effects on actuarial education and credentialing could be substantial. The SOA and CAS competence ladder already reflects the profession's recognition that AI skills are no longer optional. The CAS Exam 9 syllabus includes enterprise risk management topics that increasingly require understanding of model risk in AI systems. The SOA's ATPA module tests ML implementation skills. But neither exam pathway currently covers the specific competencies needed to develop, validate, or govern domain-specific AI models.

If actuarial-native LLMs become standard tools in pricing and reserving workflows, the profession will need practitioners who understand not just how to use these tools but how to validate their outputs, monitor for drift, and document their behavior for regulatory purposes. This creates a potential new continuing education pathway: beyond the current AI Fast Track program's focus on understanding ML techniques, a second tier focused on actuarial AI governance, validation, and deployment. The CAS's requirement that funded researchers present at CAS meetings or seminars ensures that research findings flow directly into the profession's educational infrastructure.

Timeline and What to Watch

The RFP timeline provides concrete milestones. Proposals for the actuarial reasoning track closed April 27, 2026. Researcher notification is scheduled for June 12, 2026. An interim report is due August 28, 2026, with final deliverables on October 2, 2026. The claims data track has already produced its first output in the Lieberthal et al. paper.

Three signals will indicate whether this initiative delivers on its ambitions. First, whether the funded research produces measurable performance improvements over baseline general-purpose LLMs on actuarial tasks. The RFP requires this comparison explicitly, so the results will be quantifiable. Second, whether other actuarial organizations follow. The SOA's AI research program, the IAA's governance framework work, and the IFoA's risk assessment publications all address AI, but none have funded domain-specific model development. If the CAS demonstrates success, competitive pressure among professional bodies could accelerate adoption. Third, whether carriers and vendors engage with the open-source outputs. The MPL 2.0 licensing means any carrier can incorporate CAS-funded research into proprietary systems. If major carriers begin building on publicly available actuarial AI infrastructure rather than developing everything in-house, the economics of domain-specific AI shift in the profession's favor.

The $80,000 combined budget is modest compared to carrier AI investments. AIG's Palantir-powered system, Allstate's ALLIE platform, and State Farm's OpenAI partnership each involve orders of magnitude more capital. But the CAS investment is not trying to compete with carrier spending. It is establishing the profession's independent capacity to evaluate, validate, and contribute to actuarial AI development. In a landscape where carriers choose between platform, partnership, and proprietary AI architectures, the profession's ability to set domain-specific quality standards for any architecture may matter more than building the most expensive model.

McGavick's framing at the CAS Seminar on Reinsurance captures the stakes: "Every day that moves forward, actuaries become more important in the entire insurance ecosystem." The question is whether the profession builds the tools to fulfill that role or remains dependent on vendor and carrier implementations that it can evaluate but not shape. The CAS dual RFP represents a bet on the former.

Sources

CAS, "2026 Request for Proposals: Adapting Large Language Models (LLMs) for Specialized P&C Actuarial Reasoning," February 2026 - casact.org
CAS, "Deadline Extended! 2026 Request for Proposals: Adapting LLMs for Specialized P&C Actuarial Reasoning" - casact.org
CAS, "Leveraging LLMs in Unstructured Claims Data: The CAS Issues a Research RFP for an Actuarial Solution" - casact.org
CAS, "Using Artificial Intelligence as Actuarial Tools Spurs CAS Research" - casact.org
CAS, "AI Tools and Resources" - casact.org
Actuarial Review, "Behind the Scenes at the CAS Artificial Intelligence Working Group," November 2025 - ar.casact.org
Lieberthal, R. et al., "Leveraging LLMs for Unstructured Claims Data Analysis," arXiv 2606.06089, June 2026 - arxiv.org
Roy, J. and Singh, S., "Agentic AI for Commercial Insurance Underwriting with Adversarial Self-Critique," arXiv 2602.13213, January 2026 - arxiv.org
"An Industrial-Scale Insurance LLM Achieving Verifiable Domain Mastery and Hallucination Control without Competence Trade-offs" (INS-S1), arXiv 2603.14463, March 2026 - arxiv.org
Balona, C., "ActuaryGPT: Applications of Large Language Models to Insurance and Actuarial Work," British Actuarial Journal, vol. 29, 2024 - cambridge.org
Carrier Management, "Exclude It, Harness It, Get Greedy: McGavick's Take on Insurers' AI Playbook," June 2026 - carriermanagement.com
ASOP No. 56, Modeling - actuarialstandardsboard.org
American Academy of Actuaries, "Actuarial Professionalism Considerations for Generative AI," 2026 - actuary.org

CAS Funds Research to Fine-Tune LLMs for P&C Actuarial Reasoning: What the Dual RFP Means for the Profession