InsurTech & AI

The $3 Billion Proof of Concept: Why Insurers Keep Building GenAI Pilots They'll Never Scale

March 27, 2026 · 5 min read

Key Takeaways

67% of insurance carriers are piloting GenAI; only 7% have scaled enterprise-wide, placing insurance near the bottom of all industries for AI operationalization despite being a data-rich sector (BCG Build for the Future 2024 Global Study).
Data infrastructure is the core failure point, not model quality. PwC identifies it as 'the biggest challenge by far,' with most carrier data trapped in batch-processing legacy systems incompatible with real-time AI pipelines.
70% of IT budgets are consumed by legacy infrastructure maintenance, leaving carriers structurally underfunded for the data modernization that production AI requires.
The 7% who've scaled share a consistent pattern: they invest $25M or more in AI (versus under $5M for most), and they treat data infrastructure as a prerequisite rather than a parallel workstream.
Every quarter in pilot purgatory compounds the competitive gap. Early AI adopters are achieving 20-40% cost reductions in claims and back-office operations — savings that pilot-stage carriers are still projecting while scaled competitors bank them.

The insurance industry has a GenAI problem that has nothing to do with the technology. Sixty-seven percent of carriers are actively piloting generative AI; according to BCG's Build for the Future 2024 Global Study, only 7% have successfully scaled those initiatives enterprise-wide. For every carrier that has operationalized GenAI across its business, eight more are running a pilot in perpetual proof-of-concept limbo. The global insurance AI market is projected at $8.6 billion in 2025 alone, and the majority of that spend is producing demonstrations, not deployments.

The instinct is to blame the vendors, the models, or the regulatory environment. That instinct is wrong. The data consistently points to carriers' own infrastructure decisions — made over decades, not months — as the primary execution barrier.

67% Piloting, 7% Scaled: The Vanity Metric Hiding a Real Crisis

The 67-versus-7 spread is not an adoption lag. It is a structural failure. Deloitte's survey of insurers found that 76% have implemented GenAI in at least one business function, with 82% of life and annuity carriers going further. Piloting is near-universal. Scaling is rare to the point of statistical irrelevance.

What makes this particularly damaging is that the insurance sector entered the GenAI era with genuine structural advantages: vast proprietary datasets accumulated over decades, defined actuarial workflows that lend themselves to automation, and a regulatory environment that creates moats for incumbents who execute. Those advantages have not translated into scaled deployment, which means the problem is not competitive positioning or model capability.

MIT's NANDA State of AI in Business 2025 report, based on interviews with 52 executives, surveys of 153 leaders, and analysis of 300 public AI deployments, found that 95% of GenAI pilots across industries delivered no measurable P&L impact. Insurance, despite its data advantages, is not beating that average by much. Pilots persist because they are politically safe; production deployments require accountability that most carrier organizations are not structured to absorb.

Stop Blaming the Models — Insurance's AI Problem Is a Data Infrastructure Problem

PwC's analysis of GenAI ROI in insurance is direct: data infrastructure is "the biggest challenge by far." GenAI models are commoditizing faster than carriers can modernize the data layers those models need to operate. The irony is sharp. Insurers have been collecting structured data longer than almost any other industry, yet that data is trapped in systems that cannot feed modern AI pipelines in anything approaching real time.

A 57% majority of insurance decision-makers cited legacy IT system integration as a significant barrier to GenAI adoption in an EY survey. That figure understates the problem because it measures perceived barriers, not actual architectural constraints. The real constraint is that insurance data lives in systems optimized for batch processing and regulatory compliance reporting — not for the streaming inputs that production AI applications require. Underwriting workflows built around nightly batch jobs cannot support real-time risk scoring. Claims platforms optimized for audit trails cannot feed continuous model retraining.

OpenAI, Anthropic, Google, and a dozen other providers are delivering models that would generate significant carrier value if they could reliably surface the right data. They cannot, because that data is siloed, inconsistently formatted, and locked in systems with no meaningful API layer.

How 40-Year-Old Policy Administration Systems Become AI Deployment Tombstones

The policy administration system (PAS) is where most insurance GenAI initiatives stall. The majority of carriers run on platforms like Guidewire, Duck Creek, or legacy mainframe architectures that pre-date the API economy by decades. These systems were engineered for accuracy, auditability, and batch throughput — design priorities that are directly incompatible with the low-latency, event-driven data flows that GenAI applications require in production.

Intellias estimates that 70% of insurance IT budgets are consumed by legacy infrastructure maintenance. That allocation leaves carriers spending the overwhelming majority of their technology budgets simply keeping aging systems operational, with roughly 30% available for transformation work. Credence Research projects $6.5-10 billion in insurance IT modernization spend across North America and EMEA from 2024 through 2026 — which sounds substantial until you compare it against the scale of modernization required and the rate at which AI capability is advancing during that same window.

The compounding problem is institutional knowledge erosion. With 60% of mainframe specialists approaching retirement age, carriers face accelerating loss of the human capital that understands how those legacy systems actually work internally. AI initiatives that depend on accurate data lineage from legacy platforms are being designed on top of systems whose internal logic is increasingly opaque to the people running them.

The Organizational Antibodies That Kill Every Promising Pilot

BCG's research attributes 70% of AI scaling failures to organizational and human factors rather than technical limitations. The most common manifestation is what might be called the actuarial-AI culture clash: insurance's centuries-long tradition of actuarial precision creates genuine organizational resistance to the probabilistic outputs that characterize modern machine learning.

An AI underwriting model that is accurate 94% of the time is, by actuarial standards, wrong 6% of the time. In a culture where underwriting decisions carry fiduciary and regulatory weight, that error rate is not a performance metric — it is a liability. This framing prevents production deployment of models that would, in aggregate, substantially outperform human underwriters operating within the same constraints.

Deloitte's survey found that underfunding was not meaningfully cited as a failure factor, which locates the real problem precisely. Carriers are not short of capital for AI initiatives; they are short of organizational structures capable of absorbing the accountability that running AI at scale demands. Compliance teams, actuarial departments, and legal functions each operate as effective veto points on deployment decisions, and each has incentive structures that reward caution over velocity.

What the 7% Who've Scaled Are Actually Doing Differently

The carriers that have escaped pilot purgatory share a consistent pattern, starting with investment levels rather than model selection. BCG's data shows the 7% who have scaled enterprise-wide are spending $25 million or more on AI initiatives, while most carriers remain below $5 million. At sub-$5 million budgets, organizations fund individual use cases. At $25 million and above, they fund the data infrastructure, change management, and governance frameworks that production deployment actually demands.

The second differentiator is structural: successful scalers appoint business-aligned product owners for each AI initiative rather than treating deployment as a technology project with business stakeholders consulted on the side. Deloitte's research found that close collaboration across business, technology, data, and talent functions was "the biggest reason for success" among carriers that cleared the pilot-to-production threshold.

The third factor is sequencing. The 7% treat data infrastructure modernization as a prerequisite for AI deployment, not a parallel workstream. They build the connective tissue first: unified data layers, API-enabled policy administration integrations, and real-time event streaming capabilities. The GenAI models come after the plumbing is in place. Every carrier that inverts this sequence ends up with the same result: a compelling demo that cannot survive contact with production data volumes and edge cases.

Pilot Purgatory Has a Price Tag — And It Compounds Every Quarter

The cost of staying in pilot mode is not just the direct spend on initiatives that produce no scaled value. It is the compounding competitive gap that widens as the 7% extend their operational advantages into pricing accuracy, loss ratios, and customer acquisition costs. A carrier running AI-powered underwriting at enterprise scale can price risk more precisely than a carrier underwriting manually or with legacy models; over a three-to-five year horizon, that difference surfaces in combined ratios.

Insurance Thought Leadership projects cost reductions of 20-40% for early AI adopters across claims, onboarding, and back-office operations. Carriers still in pilot mode in 2026 are not just missing those savings; they are competing against carriers already banking them.

The technical barriers are real but solvable. Data lake modernization, API layer development, and legacy system migration timed to natural policy administration renewal cycles constitute a known solution set. The organizational barriers are harder, but they yield to competitive pressure applied consistently. What the aggregate billions in pilot spend have purchased is organizational familiarity with GenAI's capability envelope. Most carriers now know what the models can do. The question for 2026 is whether they will build the infrastructure and institutional will to let those models do it at scale — or whether the proof of concept simply becomes the product.

Frequently Asked Questions

Why do insurance carriers keep funding new AI pilots if the failure rate is so high?

Pilots are politically safe because they generate demonstrations rather than accountability. According to BCG's research, most carriers stay below $5 million in AI investment, which funds individual use cases without the structural commitments that production deployment demands. The pilot model also allows technology and innovation teams to demonstrate activity without the operational risk exposure that enterprise-wide rollout requires.

Is legacy system replacement a prerequisite for scaling GenAI in insurance?

For most carriers, modernizing the data layer is effectively a prerequisite. PwC identifies data infrastructure as 'the biggest challenge by far' in GenAI deployment, and systems that cannot surface policy, claims, and customer data in real time cannot support production AI applications. Some carriers are using API middleware and retrieval-augmented generation to extract value from legacy platforms without full replacement, but this approach has significant limitations for high-volume, real-time use cases such as dynamic underwriting.

What regulatory factors are slowing insurance AI deployment beyond pilots?

NAIC model bulletins on AI systems have been adopted by 11 states plus Washington D.C., creating compliance requirements around explainability, bias testing, and audit trails that add deployment friction. Only about 5% of insurers currently have fully mature AI governance frameworks, according to Insurance Thought Leadership, though nearly 70% of large enterprises are actively investing in fairness controls and audit mechanisms for 2026 deployment cycles.

How long does it actually take for insurance AI investments to deliver ROI?

Most carriers expect two to four years to see satisfactory returns from AI investments, significantly longer than the seven-to-twelve month payback expected from conventional IT projects. MIT's NANDA State of AI in Business 2025 report found that 95% of GenAI pilots delivered no measurable P&L impact, consistent with PwC's recommendation that carriers recalibrate both the pace and the measurement framework for GenAI returns rather than applying standard IT investment timelines.

What is the competitive risk for carriers that remain in pilot mode through 2026?

The risk is compounding underwriting and operational disadvantage. BCG's data shows scaled carriers are operating at investment levels five to twenty times higher than those still piloting, and the advantages in pricing accuracy and loss ratio management widen every quarter. Insurance Thought Leadership projects 20-40% cost reductions in claims and back-office operations for early AI adopters, meaning scaled carriers are banking structural savings that pilot-stage competitors are still projecting on spreadsheets.

← Back to Blog