Why use Python instead of Excel for actuarial work?

Python handles large datasets, automates repetitive processes, and produces reproducible results. The pandas library provides Excel-like tabular operations in code, while specialized packages like chainladder implement standard reserving methods natively. Automated workflows that took two weeks in Excel can run in minutes with Python.

What is BigQuery and why do actuaries use it?

BigQuery is Google's cloud-based columnar data warehouse. It scans massive datasets much faster than on-premise SQL Server. Queries that take 2-5 minutes locally run in seconds on BigQuery. It is cloud-native, charges by data scanned (approximately $200/month for typical actuarial volumes), and integrates directly with Python through the google-cloud-bigquery library.

Why do actuaries need Docker?

Docker ensures reproducibility by packaging the exact Python version, library versions, and configuration into a container that runs identically on any machine. This eliminates 'it works on my machine' problems, simplifies onboarding, and provides audit-ready documentation of the exact environment that produced actuarial calculations.

What Python packages do actuaries use most?

The core actuarial Python stack includes pandas for data manipulation, chainladder for reserving methods, statsmodels for GLMs, scipy for distribution fitting, scikit-learn for machine learning, openpyxl for Excel output, and matplotlib/plotly for visualization. The google-cloud-bigquery library connects Python to BigQuery.

Python, BigQuery, and Docker for Actuaries: The Modern Actuarial Tech Stack Explained

Most actuarial departments still run on Excel. That is not a criticism. Excel works, it is familiar, and every actuary on earth knows how to use it. But when your quarterly reserve analysis involves pulling millions of claim transactions, running development patterns across 15 lines of business, fitting severity curves, and producing 40 exhibits for the board package, Excel starts to break down in ways that cost you time, accuracy, and sanity.

This guide walks through the modern actuarial tech stack that an increasing number of teams are adopting: Python for analysis and automation, Google BigQuery for data warehousing, and Docker for reproducibility. Each piece delivers value on its own, and they work even better together.

2 wks → 3 min

Quarterly reserve exhibit production time after automation

8 sec

BigQuery query time on 8M claim transactions (vs. 2–5 min SQL Server)

~$200/mo

Typical BigQuery cost for actuarial data volumes

Why Python Over Excel for Actuarial Work

Three things make Python the natural choice for actuarial teams moving beyond spreadsheets.

First, pandas. The pandas library is essentially Excel on steroids for tabular data. Reading a CSV, filtering it, pivoting, merging with another dataset, and writing the result to Excel takes about ten lines of code. Once you learn pandas, doing the same operations manually in spreadsheets feels unnecessary.

Second, the ecosystem. Python has libraries for everything actuaries do. statsmodels for GLMs. scipy for curve fitting and statistical distributions. scikit-learn for machine learning. And chainladder, a library that implements chain ladder, Bornhuetter-Ferguson, and other standard reserving methods natively in Python. You do not need to build those from scratch.

Third, the job market. Every actuarial job posting now lists Python as preferred or required. R is fine, SAS is legacy, and Julia is interesting but has almost no adoption outside of quant finance. Python is the safe bet and the one your team is most likely to already know.

BigQuery: Cloud-Native Data for Actuarial Teams

BigQuery solves two problems that actuarial teams constantly fight: scale and speed. A claims dataset with 8 million transaction records going back 8 years runs in 2 to 5 minutes on an on-premise SQL Server instance. The same query runs in about 8 seconds on BigQuery. That is not an exaggeration. BigQuery is a columnar data warehouse designed to scan massive datasets fast.

BigQuery is also cloud-native. No VPN into the office network. No waiting for IT to provision server access. You authenticate with a service account, run a query from a Python script on your laptop, and get results back in seconds. That flexibility matters at 9 PM before a board meeting.

Cost is reasonable. BigQuery charges by data scanned per query. For typical actuarial volumes, expect approximately $200 per month, a fraction of on-premise SQL Server licensing and hardware.

Structuring Actuarial Data in BigQuery

The recommended approach uses a few core tables: a claim transaction table partitioned by accident year and clustered by line of business, a policy table with exposure and premium data, a large loss register, and reference tables for mapping loss cause codes to development patterns. Partitioning ensures BigQuery only scans the accident years you query, reducing cost and improving speed. Clustering organizes data physically by line of business for even faster filtered queries.

Connecting Python to BigQuery

The google-cloud-bigquery library makes the connection straightforward. Authenticate with a service account (a JSON key file from your cloud admin), then query in a few lines:

Python

from google.cloud import bigquery

client = bigquery.Client()
query = """
    SELECT *
    FROM `project.dataset.claims`
    WHERE accident_year >= 2015
"""
df = client.query(query).to_dataframe()

That returns a pandas DataFrame ready for analysis. Never hardcode service account credentials in your script. Store the JSON key file in a secure location and reference it through an environment variable. Docker makes this particularly clean because you pass the credential path at runtime without baking it into the container image.

SQL vs. Python: When to Use Each

A good rule of thumb: do filtering, joining, and aggregation in SQL. Do analysis, modeling, and output formatting in Python. BigQuery is extremely efficient at scanning and aggregating large datasets. If you need total paid losses by accident year, development period, and line of business, write a SQL query. Pulling 30 million raw transactions into Python and aggregating in pandas would be slower and waste memory.

Once you have the aggregated data (a loss triangle with a few hundred cells) Python is the right tool for development factor selection, method application, diagnostic charts, and Excel output. For frequently queried datasets, create materialized views in BigQuery. These pre-compute results so the query is instant and free of re-scanning, functioning like a cached pivot table that updates automatically.

The Daily Actuarial Workflow in Python

Here is what a typical reserving day looks like with this stack:

Pull the latest data.
A Python script connects to BigQuery, runs a SQL query to pull claim transactions, paid losses, case reserves, and earned premium by accident year and line of business, and loads the results into a pandas DataFrame. About 15 seconds for the full dataset.
Build loss triangles programmatically.
Instead of manually copying numbers into an Excel template, pandas pivot tables reshape transaction data into development triangles. The chainladder library can also do this natively. A complete set of triangles for every line of business builds in under a minute.
Run actuarial methods.
Chain ladder with volume-weighted and simple average factors. Bornhuetter-Ferguson using a priori loss ratios. Cape Cod. All three methods across all lines, compared side by side, in a single script. Changing an assumption (like excluding a specific large loss from the development pattern) is a one-line code change instead of editing 15 spreadsheet tabs.
Produce board exhibits.
Output goes to Excel using openpyxl with formatted exhibits, headers, conditional formatting, and the exact layout management expects. The script produces the entire board package (approximately 40 exhibits) in about three minutes.

Most of the time savings comes not from the actuarial analysis itself, but from eliminating data wrangling. Pulling data from multiple systems, matching, reconciling, copying into templates, error-checking: all of that manual work is gone when the pipeline is automated and the data source is a single BigQuery dataset. The actual actuarial judgment (selecting factors, weighting methods, identifying unusual patterns) still takes human time, and it should.

Docker for Actuaries: Reproducibility That Auditors Love

Docker solves one of the most common problems in actuarial computing: "it works on my machine." A Docker container is a sealed box that contains everything a script needs to run: the Python version, all libraries, and configuration files. The environment is identical no matter where it runs: your laptop, a colleague's laptop, or a cloud server.

A Dockerfile is the recipe for building that box. For an actuarial workflow, it looks like this:

Dockerfile

FROM python:3.11-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY scripts/ /app/scripts/
CMD ["python", "/app/scripts/run_reserves.py"]

Five lines. The requirements.txt file pins every library and its exact version: pandas==2.1.4, chainladder==0.8.19, google-cloud-bigquery==3.14.1, and so on. Pinning versions is critical because library updates can change calculation results. You do not want reserve numbers shifting because pandas released a new version that handles floating-point rounding differently.

Why Reproducibility Matters for Regulators

When a regulator asks how you calculated your reserves, you can point to a Docker image that contains the exact code and exact environment that produced the numbers. That level of reproducibility is something Excel fundamentally cannot provide.

Onboarding also becomes trivial. Instead of a 45-minute setup document, a new analyst installs Docker, pulls the image, and runs the container. Done.

Automated Scheduling with CI/CD

Docker containers can be scheduled to run automatically using CI/CD pipelines. GitHub Actions is a practical choice: every quarter, on a scheduled date, the pipeline pulls the latest Docker image, runs the reserve analysis, and deposits the output Excel files in a shared Google Cloud Storage bucket. The team gets a notification that exhibits are ready for review. The entire quarterly process kicks off without anyone clicking a button.

Beyond Reserving: Pricing, Large Loss, and Cat Models

Pricing GLMs

Generalized linear models for commercial lines pricing follow a similar workflow: pull policy and loss data from BigQuery, clean and feature-engineer in pandas, fit the model in Python. statsmodels gives you coefficient tables, p-values, and diagnostic plots that actuaries expect. scikit-learn is the better choice for gradient boosted trees or other machine learning approaches alongside the GLM for comparison.

Large Loss Analysis

Fitting severity distributions (lognormal, Pareto, mixed exponential) to large loss data is a single function call in Python versus Solver and extensive manual setup in Excel:

Python

from scipy import stats

# Fit lognormal to large loss data
params = stats.lognorm.fit(large_loss_data)
# params contains (shape, loc, scale) immediately

You get fitted parameters back instantly and can plot the fitted distribution against empirical data to assess goodness of fit.

Cat Model Output Processing

This is where Excel completely falls apart. Cat model simulation output often contains millions of rows, one per simulated event per contract. Excel cannot open the file. In Python with pandas, you can read, aggregate, compute exceedance probability curves, and produce summary statistics for the reinsurance team in a couple of minutes.

Ad Hoc Requests

The Chief Actuary wants commercial auto loss ratios by state over five years, broken out by policy limit. In the old world: an IT ticket, a three-day wait, and a CSV that still needs formatting. With this stack: a BigQuery query, a pandas aggregation, and a formatted chart on their desk in 20 minutes.

Version Control, Visualization, and Security

Git for Actuarial Work

Everything lives in Git: every script, every SQL query, every configuration file. When an assumption changes or a calculation is modified, the change is committed with a message explaining what was done and why. If something breaks or produces unexpected results, the Git history shows exactly what changed.

Compare that to the Excel approach: Reserve_Model_v3_final_FINAL_revised.xlsx on a shared drive, and nobody knows which version produced last quarter's board package. With Git, every version is tracked, every change is documented, and any prior quarter's results can be reproduced by checking out the appropriate commit.

Visualization

matplotlib handles static charts for reports and board exhibits. It is not the prettiest library by default, but a style template matching company branding (specific colors, fonts, layout) produces clean, professional, consistent output. plotly is better for interactive exploration where you want to zoom, hover over data points, and toggle series. It is particularly useful for loss development plots comparing multiple accident years.

For dashboards, Streamlit is worth evaluating. It is a Python framework that builds interactive web apps with minimal code (dropdown menus, sliders, real-time filtering) letting underwriters explore loss ratios by state, line, and policy year without touching a spreadsheet.

Data Security

Claims data contains PII: names, addresses, injury details. BigQuery provides robust access controls through IAM roles. You can grant specific users read access to specific datasets or tables, and create views that mask sensitive columns so analysts see claim amounts but not claimant names.

For credentials, Google Cloud's Workload Identity Federation eliminates JSON key files in production entirely. The Docker container authenticates automatically in the cloud environment. For local development, service account keys go in an encrypted volume excluded from Git via .gitignore.

The Excel Bridge: Output That Stakeholders Accept

Your stakeholders want Excel. Your regulators want Excel. Your board wants PowerPoint slides with tables that were probably copied from Excel. The output format is non-negotiable, and every Python-using actuary has to accept that.

openpyxl gives full control over formatting: column widths, number formats, conditional formatting, colors, borders, merged cells, named ranges. The output can be made identical to manually-built exhibits. The key insight is separating analysis from presentation. Python does the analysis. Excel is the presentation layer. Every number in the output spreadsheet is a value, not a formula. If someone asks why a number changed, you point to the Git commit, not to a nested Excel formula.

For quick outputs, pandas.to_excel() works fine. For anything that needs to look polished, openpyxl with a template (company logo, standard headers, formatting) is the way to go. The script fills in the data and the template handles presentation.

Jupyter Notebooks vs. Production Scripts

For exploration and prototyping, Jupyter is great. Investigating a data issue, testing a new method, doing ad hoc analysis where you want to see intermediate results. The iterative cell-by-cell workflow is perfect for discovery.

For production, notebooks have real problems. Cells can be executed out of order. Hidden state accumulates in memory. The file format mixes output with code, making version control difficult. And notebooks cannot be easily scheduled or automated.

The practical rule: if you are going to run an analysis more than twice, it becomes a .py script. Prototype in a notebook, then extract the logic into a clean Python file with proper functions, error handling, and logging. The notebook stays in a sandbox folder for reference but is not part of the production pipeline.

Getting Started: Where to Begin

Automate one manual process.
Pick the most painful, repetitive task you do, maybe building a loss triangle or reformatting an exhibit. Automate that one thing. When it works and saves you time, your motivation to learn more will be natural.
Learn pandas deeply.
Not just read_csv and to_excel. Learn groupby, merge, pivot_table, multi-level indexing, and method chaining. Pandas is roughly 80% of actuarial Python work.
Do not over-engineer early.
Your first script should be a single .py file that does one thing well. You do not need classes, frameworks, or databases. A flat script with comments is perfectly fine. Refactor later when you understand the patterns.
Test against known results.
When you build a chain ladder in Python, run it on the same data you have in Excel and verify the numbers match to the penny. Actuaries are held to a standard of precision that software engineers are not. Your code needs to produce numbers you would sign your name to.
Show, do not tell.
Nobody cares about a presentation on why Python is better than Excel. They care about results. When the quarterly reserve process goes from two weeks to three minutes with the same output, management is immediately on board.

Framing It for Your Team

Python is not a replacement for Excel. It is a tool that handles the parts Excel is bad at: large data, automation, reproducibility, and version control. Excel is still there for quick calculations, reviewing results, and communicating with non-technical stakeholders. Framing it that way makes the conversation much easier.

And produce documentation. Every script should have a README that explains what it does, how to run it, and what inputs and outputs to expect. If you get hit by a bus tomorrow, someone on the team can pick up your code and run the quarterly process. That matters to management more than any technical argument about efficiency.

Key Python Packages for Actuaries

pandas

Data manipulation and analysis. The foundation of almost everything.

numpy

Numerical computing. Underlies pandas and most scientific libraries.

chainladder

Actuarial reserving methods (chain ladder, BF, Cape Cod) natively in Python.

statsmodels

GLMs, statistical tests, and regression with full diagnostic output.

scikit-learn

Machine learning models for pricing and classification.

scipy

Distribution fitting, optimization, statistical functions.

google-cloud-bigquery

Official BigQuery client for Python.

openpyxl

Read and write Excel files with full formatting control.

matplotlib

Static charts and plots for reports and board exhibits.

plotly

Interactive charts for exploration and presentations.

streamlit

Build interactive web dashboards from Python scripts.

The Bottom Line

The modern actuarial tech stack is not about chasing trends or impressing anyone with fancy tools. It is about solving real problems every actuarial team faces: slow data access, manual repetitive work, unreproducible results, and the version control nightmare of shared Excel files.

You do not need to adopt everything at once. Start with Python and pandas for one process. Add BigQuery when your data outgrows local queries. Add Docker when you need other people to run your code reliably. Each piece delivers value on its own. The actuaries who invest in these skills now will have a meaningful advantage as the industry continues to modernize, not because the tools are new, but because they solve the right problems and free you to spend your time on work that actually requires actuarial judgment.