This is the second of three articles analyzing AIG’s granted AI underwriting patents. Read Patent #1: How AIG Separates Tables from Text for the foundational extraction architecture, and Inside AIG’s Agentic AI Underwriting Machine for the full strategy overview.

Executive Summary

The first patent in AIG’s AI underwriting portfolio (U.S. Patent 12,437,155, analyzed in our companion article) describes how the system extracts data from insurance documents. This second patent answers an equally critical question: how does the system prove that its extractions are correct?

U.S. Patent 12,437,154, titled “Information extraction system for unstructured documents using retrieval augmentation providing source traceability and error control,” was filed on the same date as its companion (January 24, 2025), granted on the same date (October 7, 2025), and lists the same two inventors: Lei Zhang and Christopher Cirelli. While the two patents share underlying document processing architecture, this patent’s claims are focused specifically on traceability, auditability, and error detection.

The core invention is a system that assigns unique identifiers to every chunk of text and every table fragment extracted from source documents, requires the LLM to report which specific chunks it used when producing an extraction, stores those chunk identifiers alongside the extracted data for downstream verification, maintains creation timestamps and version histories so that document changes over time do not break the audit trail, and generates citation lists and user interfaces that link extracted data back to its source material.

This is not an incremental feature added to an extraction system. It is a governance architecture designed from the ground up to solve problems that actuaries, compliance officers, and regulators increasingly identify as blockers to AI adoption in insurance.

The patent explicitly acknowledges its regulatory context. Its text states that in “regulated industries, it may be necessary to include the reference material (e.g., as a footnote or citation) to show that the system is accurately populating the data elements and/or is unbiased.” For an industry where the NAIC’s Model Bulletin on the Use of Artificial Intelligence Systems by Insurers now requires governance frameworks, testing, and documentation of AI systems used in insurance decision-making, a system that bakes auditability into the extraction architecture rather than treating it as an afterthought has significant implications.

The patent’s approach to hallucination detection is particularly notable. Rather than relying on post-hoc checks of LLM output, the system builds detection into the extraction workflow itself. By requiring the LLM to identify the specific chunks it used for each extraction and then comparing those cited chunks against the actual source documents, the system can identify cases where the LLM generated information that was not present in the source material. This is a structural approach to the hallucination problem that goes beyond the prompt-engineering techniques most implementations rely on.

For actuaries navigating the gap between management’s enthusiasm for AI deployment and the profession’s standards for model governance, this patent offers a concrete technical template for how traceability, validation, and auditability can be embedded into an LLM-powered workflow from its foundation.

Patent Details

Patent Number U.S. 12,437,154
Title Information extraction system for unstructured documents using retrieval augmentation providing source traceability and error control
Filed January 24, 2025
Granted October 7, 2025
Assignee American International Group, Inc. (New York, NY)
Inventors Christopher Cirelli (Roswell, GA); Lei Zhang (New York, NY)
Application No. 18/831,432
Classification G06F 40/289 (Natural Language); G06F 11/34; G06F 16/3329
Primary Examiner Thierry L. Pham

The Problem: LLMs Cannot Self-Certify Their Own Accuracy

The challenge this patent addresses is fundamental to deploying large language models in any regulated industry, but it is especially acute in insurance underwriting.

When an LLM extracts data from a broker submission, there is no inherent mechanism to verify that the extracted value came from the source document. The model processes the prompt and the provided document chunks, then generates a response. If the response is factually correct, the system proceeds. But if the response is incorrect, the failure can take several forms, each with different consequences.

The LLM may extract the wrong value from the correct source. For instance, it might pull a total insured value from the wrong row of a financial table. This is a retrieval accuracy problem that can often be addressed by improving the chunking and retrieval strategy (as covered in Patent #12,437,155).

The LLM may generate a plausible value that does not appear anywhere in the source documents. This is the hallucination problem. The model produces output that looks syntactically correct and falls within expected ranges but is fabricated. In insurance underwriting, a hallucinated named insured, a fabricated coverage limit, or an invented loss history figure could propagate through pricing models, binding decisions, and policy issuance with serious financial and regulatory consequences.

The LLM may extract correctly from a source document that has since been updated. If the system populated a data element from a document six months ago and that document has been revised, the extracted data may no longer be accurate. Without version tracking, there is no way to know which version of a source document informed a given extraction.

Standard approaches to these problems involve manual review of LLM outputs, which defeats the efficiency purpose of automation, or simple validation checks against expected data types and ranges, which catch formatting errors but not substantive inaccuracies. AIG’s patent describes a more systematic approach.

The Architecture: Traceability From Ingestion to Output

The traceability system described in Patent 12,437,154 operates across the entire document processing lifecycle, from initial ingestion through to report generation. It is not a validation layer bolted onto the end of the pipeline. It is woven into every step.

Chunk-Level Identification

At the point of document ingestion, when source documents are separated into table chunks and text chunks (using the methodology described in the companion patent), a “chunk tracer” component assigns metadata to each chunk. This metadata includes a chunk identifier that uniquely identifies the chunk within the system, a document identifier that links the chunk to its source document, a page identifier that links the chunk to the specific page within the source document, and a chunk type flag that indicates whether the chunk contains tabular data or text data.

These identifiers are stored both with the chunks themselves and within the vector store used for the semantic search index. This means that at every point in the retrieval and extraction process, any chunk can be traced back to its exact source location.

LLM-Reported Source Attribution

This is where the patent’s approach becomes architecturally distinct from standard RAG implementations. When the system sends a prompt and a set of relevant chunks to the LLM for extraction, the prompt includes not only the extraction request and the chunk content but also a specific instruction for the LLM to identify which chunks it actually used in producing its response.

Each chunk provided to the LLM carries its unique identifier. The LLM is instructed to include in its response the identifiers of the chunks that informed its extraction. The system then stores these “used chunk” identifiers alongside the extracted data. This creates a direct, verifiable link between any extracted data element and the specific source material that the LLM relied on.

The patent notes that the identifiers can be globally unique or scoped to the individual prompt. For example, if 23 chunks are provided with a prompt, the integers 1 through 23 can serve as identifiers for that prompt’s scope. The important point is that the system can always determine which source material the LLM cited for a given extraction.

Hallucination Detection Through Source Verification

The traceability architecture enables a structural approach to hallucination detection. If the LLM reports using chunks 3, 7, and 15 to extract a data element, the system can retrieve the actual content of chunks 3, 7, and 15 and compare them against the extracted value. If the extracted value does not appear in any of the cited chunks, the system flags the response as potentially hallucinated.

The patent describes a “response validator” component that performs multiple checks on each LLM response. Each prompt template stores information about the expected response: data type, acceptable length, numeric range, and other constraints. The response validator executes these checks and, when a potential error is detected, stores additional tracing information (chunk identifiers, document identifiers, page identifiers) with the flagged response. This information is then available to the underwriter through the user interface for manual verification.

The patent describes this as operating even when no explicit error is detected: “In some embodiments, the response validator may store the tracing information with all responses even if no error occurs, for example, for display or regulatory purposes.” This blanket traceability posture is significant for regulated environments where the question is not just “was this extraction correct?” but “can you prove it was correct, and from what source?”

Temporal Traceability: Timestamps and Version Histories

Source documents in insurance underwriting change over time. Loss runs update quarterly. Financial statements are restated. Applications are amended. If a system extracts data from a document and that document later changes, the traceability chain is broken unless the system tracks versions.

Patent 12,437,154 addresses this with two mechanisms. First, each chunk carries a creation timestamp (when the chunk was generated from the source document) and an access timestamp (when the chunk was last used by the LLM for extraction). Second, the system maintains version histories: when a source document is updated, new chunks are created and the old chunks are retained for traceability but decommissioned from active retrieval. The system can link chunks from different versions of the same document.

This means that a compliance officer reviewing an underwriting decision months after the fact can determine exactly which version of a source document was used for each extracted data element and whether that document has been updated since the extraction occurred.

Citation Generation and User Interface

The final layer of the traceability architecture is output-facing. The patent describes a system that generates user interfaces displaying extracted data alongside citations to the source documents and pages from which the data was derived. It also describes generating citation lists for inclusion in reports, presentations, and other documents that require source attribution.

This is not merely a convenience feature. In insurance underwriting, particularly in lines like D&O liability and environmental insurance where underwriting decisions may face regulatory or legal scrutiny, the ability to produce a citation-backed data trail from extracted information back to source documents is a compliance requirement in many jurisdictions.

Mapping to AIG’s Public Disclosures and the “Critic Agent” Concept

The traceability architecture described in this patent maps directly to several aspects of AIG’s publicly disclosed AI strategy.

The Critic Agent. On the Q4 2025 earnings call, AIG described three types of AI agents in its orchestration layer: knowledge assistants that provide real-time information, adviser agents that generate insights from historical cases, and critic agents that “challenge recommendations and decisions.” The response validator and chunk-level verification described in this patent are the technical implementation of the critic agent concept. The system does not simply accept LLM output. It structurally verifies the output against the source material the LLM claims to have used.

Ontology Auditability. Zaffino told Insurance Journal in August 2025 that AIG’s “ontology will create a clear record of any actions taken, which will inform business logic and provide the ability to audit agents’ activities.” Patent 12,437,154 describes exactly how this auditability works at the technical level: through chunk identifiers, document identifiers, page identifiers, timestamps, version histories, and citation generation, all stored within the ontological data store.

The 75% to 90%+ Accuracy Improvement. AIG reported that data accuracy in underwriting processes improved from approximately 75% to over 90% after deploying its AI system. The traceability and error control architecture described in this patent is part of how that improvement was achieved and measured. Without a system that tracks which extractions were correct and which required correction, there would be no way to quantify the accuracy rate. The response validator’s ability to detect and flag errors, combined with the chunk-level tracing that allows root cause analysis of failures, creates the feedback loop necessary for continuous accuracy improvement.

Regulatory Compliance Positioning. The patent’s explicit reference to “regulated industries” and the need for citation and unbiased documentation is not incidental language. AIG operates in a regulatory environment where 24 states have now adopted the NAIC’s Model Bulletin on the Use of Artificial Intelligence Systems by Insurers, Colorado’s AI Act is set to take effect with requirements for bias testing and impact assessments, and state insurance departments are increasingly asking carriers to document how AI systems are used in underwriting and claims decisions. Building citation generation and source traceability into the patent architecture positions AIG to demonstrate compliance at a technical level, not merely a policy level.

What the Claims Protect

Patent 12,437,154 contains 20 granted claims covering methods, systems, and computer-readable media.

Claims 1 through 9 cover the core traceability method: generating chunks from a document, assigning identifiers, identifying relevant chunks for a prompt, transmitting the prompt to an LLM with both an extraction request and a request for the LLM to identify which chunks it used, and storing the extracted information with the used chunk identifiers. Dependent claims cover citation-backed user interfaces (Claim 2), error-triggered citation display (Claim 3), report generation with citations (Claim 4), table/text chunk type designation (Claim 5), citation list generation (Claim 6), timestamp recording (Claim 7), LLM error reporting (Claim 8), and PDF as the source format (Claim 9).

Claims 10 through 16 cover the system embodiment, protecting the ingestion manager, retrieval manager, and generative AI manager as a coordinated system. Notably, Claim 12 specifically protects separate traceability records for table chunks versus text chunks, and Claim 15 protects globally unique identifiers with version history and cross-version linking.

Claims 17 through 20 cover the computer-readable medium embodiment, with Claim 20 adding creation timestamps, access timestamps, usage counts, and error flags to the traceability records.

The breadth of coverage here is significant. Any competitor implementing a RAG-based extraction system that includes LLM-reported source attribution and chunk-level traceability in an insurance context would need to evaluate these claims carefully.

Why This Patent Matters Most for Actuaries

Of the three patents in AIG’s AI underwriting portfolio, this one has the most direct relevance to actuarial practice and professional standards.

ASOP No. 56 alignment. ASOP No. 56 (Modeling) requires actuaries to ensure appropriate model governance and controls, assess the quality of data and assumptions, perform validation and testing, and disclose material limitations. The traceability architecture in this patent provides a technical framework for meeting each of these requirements when the “model” is an LLM-based extraction system. Chunk-level tracing supports data quality assessment. The response validator supports validation and testing. Version histories and citation generation support disclosure requirements.

The governance gap problem. As we analyzed in The AI Governance Gap in Actuarial Practice, one of the central challenges facing the profession is that AI systems are being deployed faster than governance frameworks are being developed. This patent demonstrates that it is possible to build governance into the system architecture itself rather than trying to layer it on after deployment. The traceability, version control, and citation capabilities described here are not manual compliance processes. They are automated, systematic, and built into every extraction the system performs.

Hallucination risk in actuarial workflows. For actuaries who use AI tools for data extraction, analysis, or report generation, the hallucination problem is not theoretical. An LLM that fabricates a loss ratio figure, invents a reinsurance treaty term, or misattributes a financial statement value can introduce errors that propagate through pricing models, reserve estimates, and capital adequacy calculations. The source verification approach in this patent, where the LLM is required to cite its sources and those citations are verified against actual source material, provides a model for how hallucination risk can be managed structurally rather than relying on prompt engineering alone.

Regulatory examination preparedness. As state insurance departments begin conducting examinations of carriers’ AI systems (a trend that the NAIC Model Bulletin is accelerating), the ability to produce a complete audit trail from any underwriting data element back to its source document will become a competitive advantage and potentially a compliance requirement. The system described in this patent generates that audit trail automatically as a byproduct of its normal operation.

Model risk management for AI systems. Enterprise risk management actuaries increasingly face questions about how to categorize and govern AI systems within their model risk frameworks. This patent provides a concrete reference architecture for what AI model governance looks like when implemented at the system level: automated validation, source attribution, version tracking, error detection, and citation generation. It is a more complete answer than most frameworks currently offer.

What Comes Next in This Series

The first patent in this series described how AIG’s system ingests and extracts data from insurance documents. This second patent described how the system ensures those extractions are traceable, verifiable, and auditable. The third and final patent addresses a specific and technically challenging document type that is ubiquitous in insurance underwriting: the complex, multi-table spreadsheet.

Patent #12,511,320 introduces a chain-of-thought prompting methodology that guides the LLM through a structured reasoning process to identify individual tables within a single spreadsheet, extract metadata including unit conversions for financial data, and reconstruct each table for independent processing. It also introduces a multi-modal language model (MMLM) architecture for processing image-based documents, including handwritten forms and scanned questionnaires. For actuaries who have ever struggled with the data quality issues that arise from processing complex financial schedules, this third patent describes how AIG is solving that problem at the system level.

Sources

  1. U.S. Patent No. 12,437,154, “Information extraction system for unstructured documents using retrieval augmentation providing source traceability and error control,” filed Jan. 24, 2025, granted Oct. 7, 2025. Assignee: American International Group, Inc. Inventors: Lei Zhang, Christopher Cirelli. Justia
  2. U.S. Patent No. 12,437,155, “Information extraction system for unstructured documents using independent tabular and textual retrieval augmentation,” filed Jan. 24, 2025, granted Oct. 7, 2025. Assignee: American International Group, Inc. Justia
  3. AIG Q4 2025 Earnings Call Transcript, Yahoo Finance, February 11, 2026. finance.yahoo.com
  4. “AIG AI: From Digital Twins to Underwriting Agents.” Coverager, February 15, 2026. coverager.com
  5. “AIG CEO Zaffino Highlights Integration of GenAI to Create Digital Twin of Business.” Insurance Journal, August 12, 2025. insurancejournal.com
  6. “Insurance Giant AIG Deploys Agentic AI with Orchestration Layer.” AI News, February 17, 2026. artificialintelligence-news.com
  7. NAIC Model Bulletin, “Use of Artificial Intelligence Systems by Insurers,” December 2023, updated 2025. naic.org
  8. Actuarial Standard of Practice No. 56, “Modeling,” Actuarial Standards Board, effective October 1, 2020. actuarialstandardsboard.org
  9. “AIG’s Zaffino: Outcomes From AI Use Went From ‘Aspirational’ to ‘Beyond Expectations’.” Insurance Journal, February 17, 2026. insurancejournal.com
  10. “‘Speed Drives Growth’ as Gen AI Accelerates Underwriting: AIG CEO.” Reinsurance News, November 7, 2025. reinsurancene.ws

Further Reading on actuary.info

Stay ahead with daily actuarial intelligence - news, analysis, and career insights delivered free.

Subscribe to Actuary Brew Browse All Insights