Exam MAS-II — Generalized Linear Models Flashcards

Generalized linear models for CAS Exam MAS-II: the exponential dispersion family with its mean and variance functions, the three GLM components and canonical/log/logit links, offsets for exposure, maximum-likelihood fitting by iteratively reweighted least squares, deviance, scaled deviance, Pearson chi-square and their residuals, nested-model drop-in-deviance and F tests, AIC/BIC, overdispersion and quasi-likelihood, and multiplicative insurance ratemaking — with fully worked predicted means, deviance tests, and coefficient interpretations.

44 cards6 topicsFree · fact-checked · LaTeX math

Tap card or press Space to flip

Answer

Import this deck

Download all 44 cards and import them into your flashcard app (JSON or CSV — works with Anki). Using the Willys app? No import needed — this deck is already built in (Settings → Library → Browse).

Download JSON Download CSV

Every deck is built into the Willys app

All of these decks — including the full practice problem banks — come built into Willys AI Flashcards & Quizzes for iPhone & iPad (Mac version coming soon), with FSRS + SM-2 spaced repetition, streaks, and exam-date cram mode. 14-day free trial, then $14.99. To load a deck in the app: Settings → Library → Browse, then pick your exam and deck.

Download on the App Store

More Exam MAS-II decks:

Bayesian Analysis Bayesian Analysis Practice Credibility Credibility Practice Generalized Linear Models Practice Linear Mixed Models

← All Exam MAS-II decks

Browse all 44 cards as a list

Exponential dispersion family
State the **exponential dispersion family (EDF)** density and name each piece.
$f(y;\theta,\phi)=\exp\!\left\{\frac{y\theta-b(\theta)}{\phi}+c(y,\phi)\right\}$. $\theta$ = **canonical (natural) parameter**, a function of the mean. $\phi$ = **dispersion parameter** (scale). $b(\theta)$ = **cumulant function**; its derivatives give the moments. $c(y,\phi)$ = normalizing term not involving $\theta$. This single form contains the normal, Poisson, gamma, binomial, and inverse-Gaussian distributions.
Exponential dispersion family
From the EDF, give the **mean** and **variance** of $Y$ in terms of $b(\theta)$.
$\mu=E[Y]=b'(\theta)$ and $\text{Var}(Y)=\phi\,b''(\theta)$. Since $b''(\theta)$ can be written as a function of the mean, define the **variance function** $V(\mu)=b''(\theta)$, so $\text{Var}(Y)=\phi\,V(\mu)$. The mean comes from the **first** derivative of the cumulant function and the variance from the **second**.
Exponential dispersion family
Show that the **Poisson** distribution is in the EDF and identify $\theta$, $b(\theta)$, $\phi$, and $V(\mu)$.
$P(Y=y)=\frac{e^{-\mu}\mu^{y}}{y!}=\exp\{y\ln\mu-\mu-\ln y!\}$. Matching the EDF: $\theta=\ln\mu$, so $\mu=e^{\theta}$ and $b(\theta)=e^{\theta}=\mu$; $\phi=1$; $c(y,\phi)=-\ln y!$. Then $b'(\theta)=e^{\theta}=\mu$ (mean) and $b''(\theta)=e^{\theta}=\mu$, so $V(\mu)=\mu$ and $\text{Var}(Y)=\mu$.
Exponential dispersion family
Show the **normal** $N(\mu,\sigma^{2})$ distribution is in the EDF and identify $\theta$, $b(\theta)$, $\phi$, $V(\mu)$.
$f(y)=\exp\!\left\{\frac{y\mu-\mu^{2}/2}{\sigma^{2}}-\frac{y^{2}}{2\sigma^{2}}-\frac{1}{2}\ln(2\pi\sigma^{2})\right\}$. Thus $\theta=\mu$, $b(\theta)=\frac{\theta^{2}}{2}$, and $\phi=\sigma^{2}$. $b'(\theta)=\theta=\mu$ (mean) and $b''(\theta)=1$, so $V(\mu)=1$ and $\text{Var}(Y)=\sigma^{2}\cdot 1=\sigma^{2}$ — the variance is constant, independent of the mean.
Link & variance functions
Give the **variance function** $V(\mu)$ for the normal, Poisson, gamma, binomial (proportion), and inverse-Gaussian distributions.
Normal: $V(\mu)=1$. Poisson: $V(\mu)=\mu$. Gamma: $V(\mu)=\mu^{2}$. Binomial proportion: $V(\mu)=\mu(1-\mu)$. Inverse Gaussian: $V(\mu)=\mu^{3}$. The variance function is the GLM's fingerprint: it tells you how the response's spread scales with its mean and so dictates the distributional choice.
Link & variance functions
Define the **Tweedie** family and explain why it suits aggregate insurance losses.
The Tweedie family has power variance function $V(\mu)=\mu^{p}$. Special cases: $p=0$ normal, $p=1$ Poisson, $p=2$ gamma, $p=3$ inverse Gaussian. For $1<p<2$ the Tweedie is a **compound Poisson–gamma**: a point mass at zero (no claims) plus a continuous positive part (claim amounts). This exactly matches pure-premium data — many exact zeros mixed with positive aggregate losses — making one Tweedie GLM serve where frequency $\times$ severity would otherwise be modeled separately.
Link & variance functions
Name the **three components** of a generalized linear model.
1. **Random component:** the response $Y$ has a distribution from the exponential dispersion family with mean $\mu$. 2. **Systematic component (linear predictor):** $\eta=X\beta=\beta_0+\beta_1 x_1+\cdots+\beta_p x_p$, a linear combination of covariates. 3. **Link function:** $g(\mu)=\eta$ connects the mean to the linear predictor, so $\mu=g^{-1}(\eta)$. Ordinary linear regression is the special case: normal random component, identity link.
Link & variance functions
What is a **link function** $g$, and what must it satisfy?
The link relates the mean to the linear predictor: $g(\mu)=\eta=X\beta$, so $\mu=g^{-1}(\eta)$. It must be **monotonic and differentiable** so that $g^{-1}$ exists and maps the unbounded $\eta\in(-\infty,\infty)$ into the valid range of $\mu$. Example: the **log link** $g(\mu)=\ln\mu$ keeps fitted means positive ($\mu=e^{\eta}>0$), which is essential for counts and severities.
Link & variance functions
Define the **canonical link** and give it for the normal, Poisson, gamma, and binomial.
The canonical link sets $\eta=\theta$, i.e. $g(\mu)=\theta(\mu)$, the natural parameter as a function of the mean. Normal: identity, $g(\mu)=\mu$. Poisson: log, $g(\mu)=\ln\mu$. Gamma: inverse (reciprocal), $g(\mu)=\mu^{-1}$ (sometimes the log link is used instead for interpretability). Binomial: logit, $g(\mu)=\ln\frac{\mu}{1-\mu}$. Canonical links give simpler estimating equations and guarantee a concave log-likelihood.
Insurance applications
Why is the **log link** the standard choice for insurance ratemaking?
With a log link $\ln\mu=\beta_0+\beta_1 x_1+\cdots$, exponentiating gives a **multiplicative** model: $\mu=e^{\beta_0}e^{\beta_1 x_1}\cdots e^{\beta_p x_p}$. The base rate $e^{\beta_0}$ is multiplied by a **relativity** $e^{\beta_j}$ for each rating factor — exactly the structure of classification rating plans. It also forces $\mu>0$, which is required for frequencies and severities, and keeps factors from producing negative premiums.
Link & variance functions
Define the **logit link** and the **logistic regression** model for a binary outcome.
For $Y\in\{0,1\}$ with $\mu=P(Y=1)$, the logit link is $g(\mu)=\ln\frac{\mu}{1-\mu}=\eta=X\beta$. Inverting, $\mu=\frac{e^{\eta}}{1+e^{\eta}}=\frac{1}{1+e^{-\eta}}$, a probability in $(0,1)$. The quantity $\frac{\mu}{1-\mu}$ is the **odds**; $e^{\beta_j}$ is the **odds ratio** for a one-unit increase in $x_j$.
Insurance applications
What is an **offset** in a GLM and when is it used?
An offset is a covariate with its coefficient fixed at $1$: $\eta=\ln(\text{exposure})+X\beta$. With a log link this models the **rate per unit exposure**: $\mu=\text{exposure}\cdot e^{X\beta}$, so the linear predictor explains the per-exposure rate while the count scales with exposure. Used in Poisson frequency models where exposure (e.g. car-years, policy-months) varies by record; the offset $\ln(\text{exposure})$ enters with no estimated coefficient.
Insurance applications
A Poisson GLM for claim frequency has $\ln\mu=-2.0+0.4\,x_{\text{urban}}$ with an offset $\ln E$ for exposure $E$. For an urban policy with $E=2.5$ car-years, find the **expected number of claims**.
Linear predictor: $\eta=\ln E + (-2.0)+0.4(1)=\ln 2.5 -1.6$. $\ln 2.5\approx 0.91629$, so $\eta\approx 0.91629-1.6=-0.68371$. Expected claims $=e^{\eta}=e^{-0.68371}\approx 0.5047$. Equivalently $\mu=E\cdot e^{-2.0}e^{0.4}=2.5(0.135335)(1.491825)\approx 2.5(0.201897)\approx 0.5047$ claims.
Insurance applications
A log-link severity GLM has $\ln\mu=6.2+0.30\,x_1-0.15\,x_2$. Find the predicted mean severity for $x_1=1,\;x_2=0$, and the relativity for $x_1$.
Linear predictor: $\eta=6.2+0.30(1)-0.15(0)=6.5$. Predicted mean: $\mu=e^{6.5}\approx 665.14$. Relativity for $x_1$: $e^{0.30}\approx 1.3499$ — a one-unit increase in $x_1$ multiplies the expected severity by about $1.35$, a $35\%$ increase, holding $x_2$ fixed.
Insurance applications
How do you interpret a **log-link coefficient** $\beta_j$ as a multiplicative effect?
Under $\ln\mu=\beta_0+\sum\beta_j x_j$, increasing $x_j$ by one unit multiplies the mean by $e^{\beta_j}$: $\frac{\mu_{\text{new}}}{\mu_{\text{old}}}=e^{\beta_j}$. So $e^{\beta_j}$ is the **relativity**; $(e^{\beta_j}-1)\times 100\%$ is the percent change. Example: $\beta_j=0.182$ gives $e^{0.182}\approx 1.20$, a $20\%$ increase per unit of $x_j$.
Insurance applications
A classification log-link GLM has base level $e^{\beta_0}=300$, a Region B relativity $e^{\beta_{\text{B}}}=1.25$, and a Class 2 relativity $e^{\beta_{\text{2}}}=0.80$. Find the predicted pure premium for a **Region B, Class 2** policy.
Multiplicative model: $\mu=e^{\beta_0}\cdot e^{\beta_{\text{B}}}\cdot e^{\beta_{\text{2}}}=300(1.25)(0.80)$. $300\times 1.25=375$; $375\times 0.80=300$. Predicted pure premium $=\$300$. The two relativities offset each other, returning the base. This is the standard multiplicative rating-plan structure a log link produces.
Insurance applications
In a **logistic regression** for lapse, $\ln\frac{\mu}{1-\mu}=-1.5+0.8\,x$. For $x=1$, find the predicted lapse probability and the odds ratio for $x$.
Linear predictor: $\eta=-1.5+0.8(1)=-0.7$. Probability: $\mu=\frac{1}{1+e^{-\eta}}=\frac{1}{1+e^{0.7}}=\frac{1}{1+2.01375}=\frac{1}{3.01375}\approx 0.3318$. Odds ratio for a one-unit rise in $x$: $e^{0.8}\approx 2.2255$ — the odds of lapsing multiply by about $2.23$.
Estimation & IRLS
How are GLM coefficients **estimated**, and why not ordinary least squares?
GLM coefficients are estimated by **maximum likelihood** — choose $\beta$ to maximize the EDF log-likelihood. OLS minimizes squared error, which is the MLE **only** for the normal/identity case. For non-normal responses (Poisson, gamma, binomial) the variance depends on the mean and the link is nonlinear, so the likelihood equations have no closed form and must be solved numerically. The standard algorithm is **iteratively reweighted least squares (IRLS)**.
Estimation & IRLS
Describe the **IRLS** algorithm for fitting a GLM.
Iteratively reweighted least squares is Fisher scoring applied to the GLM score equations. Each iteration: 1. Form the **adjusted response** $z_i=\eta_i+(y_i-\mu_i)g'(\mu_i)$. 2. Form weights $w_i=\frac{1}{V(\mu_i)\,[g'(\mu_i)]^{2}}$. 3. Solve the **weighted least squares** problem regressing $z$ on $X$ with weights $w$ to update $\beta$. 4. Recompute $\eta=X\beta$, $\mu=g^{-1}(\eta)$, and the weights; repeat until $\beta$ converges. The weights and working response are recomputed every step — hence "iteratively reweighted."
Estimation & IRLS
Write the **score (likelihood) equations** that the MLE of $\beta$ solves in a GLM.
For each coefficient $j$, $\sum_{i=1}^{n}\frac{(y_i-\mu_i)\,x_{ij}}{V(\mu_i)\,g'(\mu_i)}=0$ (the $\phi$ cancels). These are nonlinear in $\beta$ because $\mu_i=g^{-1}(x_i^{\top}\beta)$. Under the **canonical link** they simplify to $X^{\top}y=X^{\top}\mu$, i.e. the model reproduces the covariate-weighted totals of the data — for an intercept that forces $\sum y_i=\sum\mu_i$.
Estimation & IRLS
For a Poisson GLM with **only an intercept** and log link, show that $\hat\mu=\bar y$. Data: $y=(2,4,3,5,6)$.
Canonical-link score equation forces $\sum y_i=\sum\mu_i$. With one parameter every fitted value equals $\hat\mu$, so $\sum y_i=n\hat\mu\Rightarrow\hat\mu=\bar y$. $\sum y_i=2+4+3+5+6=20$, $n=5$, so $\hat\mu=\frac{20}{5}=4$. The fitted intercept is $\hat\beta_0=\ln 4\approx 1.3863$ since $e^{\hat\beta_0}=\hat\mu=4$.
Estimation & IRLS
Why does the **dispersion parameter** $\phi$ not affect the point estimates $\hat\beta$, but does affect their standard errors?
$\phi$ cancels out of the score equations $\sum\frac{(y_i-\mu_i)x_{ij}}{V(\mu_i)g'(\mu_i)}=0$, so the location of the maximum — the $\hat\beta$ — does not depend on $\phi$. But the covariance of $\hat\beta$ is $\widehat{\text{Cov}}(\hat\beta)=\phi\,(X^{\top}WX)^{-1}$, which scales with $\phi$. So an under- or over-estimated $\phi$ leaves the fitted means unchanged but distorts standard errors, confidence intervals, and significance tests.
Deviance & residuals
Define the **deviance** $D$ of a fitted GLM.
$D=2\phi\,(\ell_{\text{sat}}-\ell_{\text{model}})$, where $\ell_{\text{sat}}$ is the maximized log-likelihood of the **saturated model** (one parameter per observation, $\hat\mu_i=y_i$) and $\ell_{\text{model}}$ is that of the fitted model. The deviance is a likelihood-ratio measure of how far the fitted model is from a perfect fit — the GLM analogue of the residual sum of squares. Smaller deviance means better fit.
Deviance & residuals
Distinguish **deviance** $D$ from **scaled deviance** $D^{*}$.
The (unscaled) **deviance** is $D=2(\ell_{\text{sat}}-\ell_{\text{model}})\,\phi$ expressed in the data's units — for normal data $D=\sum(y_i-\hat\mu_i)^2$. The **scaled deviance** is $D^{*}=\frac{D}{\phi}=2(\ell_{\text{sat}}-\ell_{\text{model}})$, dimensionless. It is $D^{*}$ that is (asymptotically) $\chi^{2}$ distributed. When $\phi=1$ (Poisson, binomial) the two coincide.
Deviance & residuals
Give the **deviance contribution** for a Poisson observation and compute the total deviance. Data $y=(3,5,2)$, fitted $\hat\mu=(4,4,4)$.
Poisson deviance: $D=2\sum\left[y_i\ln\frac{y_i}{\hat\mu_i}-(y_i-\hat\mu_i)\right]$ (with $\phi=1$). Term 1: $3\ln\frac{3}{4}-(3-4)=3(-0.287682)+1=0.137$ (i.e. $-0.863+1$). Term 2: $5\ln\frac{5}{4}-(5-4)=5(0.223144)-1=1.116-1=0.116$. Term 3: $2\ln\frac{2}{4}-(2-4)=2(-0.693147)+2=-1.386+2=0.614$. $D=2(0.137+0.116+0.614)=2(0.867)\approx 1.733$.
Deviance & residuals
Give the **Pearson** $\chi^{2}$ statistic and the **Pearson residual**.
Pearson residual: $r_i^{P}=\frac{y_i-\hat\mu_i}{\sqrt{V(\hat\mu_i)}}$. Pearson $\chi^{2}=\sum_{i=1}^{n}\frac{(y_i-\hat\mu_i)^{2}}{V(\hat\mu_i)}=\sum (r_i^{P})^{2}$. It is an alternative to the deviance for assessing fit; under a correct model the scaled version $\frac{X^{2}}{\phi}$ is approximately $\chi^{2}_{n-p}$.
Deviance & residuals
Compute the **Pearson** $\chi^{2}$ for a Poisson fit with $y=(3,5,2)$, $\hat\mu=(4,4,4)$.
For Poisson, $V(\hat\mu_i)=\hat\mu_i$, so $X^{2}=\sum\frac{(y_i-\hat\mu_i)^{2}}{\hat\mu_i}$. $\frac{(3-4)^2}{4}=\frac{1}{4}=0.25$. $\frac{(5-4)^2}{4}=\frac{1}{4}=0.25$. $\frac{(2-4)^2}{4}=\frac{4}{4}=1.00$. $X^{2}=0.25+0.25+1.00=1.50$. This is close to the deviance $1.733$ from the same data — the two fit measures usually agree.
Deviance & residuals
Define the **deviance residual** $r_i^{D}$ and how it relates to $D$.
$r_i^{D}=\text{sign}(y_i-\hat\mu_i)\sqrt{d_i}$, where $d_i\ge 0$ is observation $i$'s contribution to the deviance, so $D=\sum_i d_i=\sum_i (r_i^{D})^{2}$. Deviance residuals are usually closer to normality than Pearson residuals, making them the preferred residual for diagnostic plots. Each one shows how much a single observation worsens the overall fit, signed by direction.
Model selection & diagnostics
How is the **deviance** used to test **nested models** (the drop-in-deviance test)?
For a smaller model $M_0$ nested in a larger model $M_1$ with $\phi$ known (e.g. $\phi=1$), under $H_0$ (the extra terms are zero): $\Delta D^{*}=D^{*}_{0}-D^{*}_{1}\;\dot\sim\;\chi^{2}_{\;\Delta p}$, where $\Delta p$ is the number of extra parameters in $M_1$. Reject $H_0$ (keep the larger model) when $\Delta D^{*}$ exceeds the $\chi^{2}_{\Delta p}$ critical value. This is a likelihood-ratio test in deviance form.
Model selection & diagnostics
A Poisson GLM ($\phi=1$): reduced model deviance $D_0=45.2$ on $30$ df; full model deviance $D_1=33.0$ on $27$ df. Test the $3$ added terms at $\alpha=0.05$ ($\chi^{2}_{3,0.95}=7.815$).
Drop-in-deviance statistic: $\Delta D=D_0-D_1=45.2-33.0=12.2$. Degrees of freedom: $\Delta p=30-27=3$. Compare $12.2$ to $\chi^{2}_{3,0.95}=7.815$. Since $12.2>7.815$, **reject** $H_0$: the three added predictors significantly improve the fit, so keep the fuller model.
Model selection & diagnostics
When $\phi$ must be **estimated** (e.g. gamma, normal), how do you compare nested GLMs?
Use an **F-test** instead of the $\chi^{2}$ drop-in-deviance, because dividing by an estimated $\phi$ introduces extra sampling variability: $F=\frac{(D_0-D_1)/\Delta p}{D_1/(n-p_1)}=\frac{(D_0-D_1)/(p_1-p_0)}{\hat\phi}$, where $\hat\phi=\frac{D_1}{n-p_1}$ (or the Pearson estimate). Compare to $F_{\Delta p,\;n-p_1}$. Large $F$ favors the larger model.
Model selection & diagnostics
A gamma GLM: $D_0=120$ ($p_0=4$), $D_1=96$ ($p_1=7$), $n=50$. Carry out the F-test for the $3$ extra terms ($F_{3,43,0.95}\approx 2.82$).
Estimate dispersion from the full model: $\hat\phi=\frac{D_1}{n-p_1}=\frac{96}{50-7}=\frac{96}{43}\approx 2.2326$. Numerator: $\frac{D_0-D_1}{p_1-p_0}=\frac{120-96}{3}=\frac{24}{3}=8$. $F=\frac{8}{2.2326}\approx 3.583$. Since $3.583>F_{3,43,0.95}\approx 2.82$, **reject** $H_0$: the three added covariates significantly improve the gamma model.
Model selection & diagnostics
Define **AIC** and **BIC** and state how to use them in GLM selection.
$\text{AIC}=-2\ell(\hat\beta)+2p$ and $\text{BIC}=-2\ell(\hat\beta)+p\ln n$, where $p$ is the number of estimated parameters and $n$ the sample size. Both reward fit (small $-2\ell$) and penalize complexity; **lower is better**. BIC's penalty $\ln n$ exceeds AIC's $2$ once $n>7.4$, so BIC favors **smaller** models. Unlike the drop-in-deviance test, they can compare **non-nested** models.
Model selection & diagnostics
Model A has $-2\ell=210$ with $p=5$; Model B has $-2\ell=204$ with $p=9$; $n=100$. Pick the better model by **AIC** and by **BIC**.
**AIC:** $A=210+2(5)=220$; $B=204+2(9)=222$. Model **A** wins (lower AIC). **BIC** ($\ln 100\approx 4.60517$): $A=210+5(4.60517)=210+23.03=233.03$; $B=204+9(4.60517)=204+41.45=245.45$. Model **A** wins again, by a wider margin since BIC penalizes B's extra parameters harder. The $6$-point likelihood gain from B does not justify its $4$ extra parameters.
Model selection & diagnostics
What is **overdispersion**, how do you detect it, and why does it matter?
Overdispersion is when the data's variance exceeds the value the assumed distribution implies — e.g. a Poisson model where $\text{Var}(Y)>\mu$. A quick check: the ratio $\frac{X^{2}}{n-p}$ (or $\frac{D}{n-p}$) is well above $1$. If ignored it leaves $\hat\beta$ unchanged but makes **standard errors too small**, so terms look more significant than they are. Remedies: an estimated dispersion $\hat\phi$ (quasi-Poisson) or a different distribution (negative binomial).
Model selection & diagnostics
A Poisson GLM has Pearson $X^{2}=180$ on $n-p=60$ degrees of freedom. Estimate the **overdispersion factor** and adjust the standard errors.
Dispersion estimate: $\hat\phi=\frac{X^{2}}{n-p}=\frac{180}{60}=3.0$, well above $1$ — clear overdispersion. The quasi-Poisson correction multiplies the naive covariance by $\hat\phi$, so each standard error is inflated by $\sqrt{\hat\phi}=\sqrt{3}\approx 1.732$. A coefficient with naive SE $0.10$ should be reported with SE $\approx 0.173$; its $z$-statistic shrinks by the same factor $1.732$, tempering false significance.
Estimation & IRLS
What is **quasi-likelihood**, and when is it useful?
Quasi-likelihood specifies only a **mean–variance relationship** $\text{Var}(Y)=\phi\,V(\mu)$ and a link, without naming a full EDF distribution. The quasi-score equations are the same as a GLM's, $\sum\frac{(y_i-\mu_i)x_{ij}}{\phi V(\mu_i)g'(\mu_i)}=0$, so $\hat\beta$ is found the same way. It is used for **overdispersion** (quasi-Poisson with $V(\mu)=\mu$ but $\phi$ estimated $>1$) when no exact distribution fits, giving valid estimates and corrected standard errors.
Estimation & IRLS
Estimate $\phi$ two ways for a gamma GLM. Deviance $D=86.4$, Pearson $X^{2}=78.0$, $n=44$, $p=4$.
Residual degrees of freedom: $n-p=44-4=40$. **Deviance estimate:** $\hat\phi_D=\frac{D}{n-p}=\frac{86.4}{40}=2.16$. **Pearson estimate:** $\hat\phi_P=\frac{X^{2}}{n-p}=\frac{78.0}{40}=1.95$. The Pearson estimate is the more common default for the gamma dispersion. Both feed into $\widehat{\text{Cov}}(\hat\beta)=\hat\phi(X^{\top}WX)^{-1}$ for standard errors.
Model selection & diagnostics
Use the **Wald test** to assess a single coefficient. A GLM gives $\hat\beta_1=0.52$ with standard error $0.18$. Test $H_0:\beta_1=0$ at $\alpha=0.05$.
Wald statistic: $z=\frac{\hat\beta_1}{\text{SE}}=\frac{0.52}{0.18}\approx 2.889$. Compare to the standard-normal critical value $z_{0.975}=1.96$. Since $2.889>1.96$, **reject** $H_0$ — $\beta_1$ is significantly nonzero. Approx. $95\%$ CI for $\beta_1$: $0.52\pm 1.96(0.18)=0.52\pm 0.3528=(0.167,\,0.873)$; exponentiating gives a relativity CI of $(e^{0.167},e^{0.873})\approx(1.18,\,2.39)$.
Model selection & diagnostics
Why are **deviance/AIC differences** preferred over $R^{2}$ for comparing GLMs?
Ordinary $R^{2}$ is built on the normal-model residual sum of squares and has no clean meaning when the variance changes with the mean and the link is nonlinear. GLM fit is instead judged by **likelihood-based** measures: deviance (likelihood-ratio fit), drop-in-deviance / F-tests for nested comparisons, and AIC/BIC for penalized comparison (including non-nested models). Pseudo-$R^{2}$ measures exist but are descriptive only; inference rests on the deviance and information criteria.
Model selection & diagnostics
In a gamma severity GLM with **log link**, the residual deviance for the **null** (intercept-only) model is $58.0$ and for the **fitted** model is $40.0$. Comment on how much variation the predictors explain.
The drop in deviance from adding predictors is $58.0-40.0=18.0$ on $\Delta p$ degrees of freedom — a deviance-based analogue of the explained sum of squares. A "deviance pseudo-$R^{2}$" is $1-\frac{40.0}{58.0}=1-0.6897\approx 0.310$, i.e. the covariates account for about $31\%$ of the null deviance. Whether the $18.0$ drop is significant is judged by a drop-in-deviance F-test (gamma $\phi$ is estimated), not by the percentage alone.
Insurance applications
Why does the **gamma GLM with log link** pair naturally with the **Poisson GLM with log link** in a frequency–severity pure-premium model?
Frequency (counts per exposure) is modeled with a **Poisson log-link** GLM (variance $\propto\mu$, offset $\ln$ exposure); severity (average cost per claim) is modeled with a **gamma log-link** GLM (variance $\propto\mu^{2}$, so larger claims are noisier — a realistic right-skewed cost shape). Both being multiplicative, the **pure premium** is their product: $\widehat{\text{PP}}=\hat\mu_{\text{freq}}\times\hat\mu_{\text{sev}}$, and the combined relativity for a factor is the product of its frequency and severity relativities.
Insurance applications
A multiplicative rating model has frequency relativity $1.30$ and severity relativity $0.90$ for "young driver," with a base pure premium of $\$500$. Find the young-driver pure premium and its overall relativity.
Overall pure-premium relativity $=$ frequency relativity $\times$ severity relativity $=1.30\times 0.90=1.17$. Young-driver pure premium $=\$500\times 1.17=\$585$. So despite a $10\%$ lower average severity, the $30\%$ higher frequency dominates, giving a net $17\%$ surcharge — exactly the product rule a log-link (multiplicative) GLM enforces.
Link & variance functions
Compare the **canonical link** with a **non-canonical** link choice in practice (e.g. gamma severity).
The gamma's canonical link is the **inverse** $g(\mu)=\mu^{-1}$, which guarantees a concave log-likelihood and slightly simpler estimating equations. But actuaries usually fit gamma severities with the **log link** instead: it keeps $\mu>0$, gives interpretable **multiplicative relativities** $e^{\beta_j}$, and matches the rating-plan structure. The exam point: the canonical link is mathematically convenient, but the link should be chosen for the application — log link for multiplicative pricing regardless of canonical status.