Exam SRM — Generalized Linear Models Flashcards

Generalized linear models for SOA Exam SRM: the three GLM components and the linear-exponential family, canonical and non-canonical link functions, logistic regression with odds ratios and predicted probabilities, Poisson regression with log links, offsets and overdispersion, maximum-likelihood estimation by IRLS, deviance and residuals, and nested-model comparison via the drop-in-deviance likelihood-ratio test, AIC, and BIC — with fully worked numeric examples.

43 cards6 topicsFree · fact-checked · LaTeX math

Tap card or press Space to flip

Answer

Import this deck

Download all 43 cards and import them into your flashcard app (JSON or CSV — works with Anki). Using the Willys app? No import needed — this deck is already built in (Settings → Library → Browse).

Download JSON Download CSV

Every deck is built into the Willys app

All of these decks — including the full practice problem banks — come built into Willys AI Flashcards & Quizzes for iPhone & iPad (Mac version coming soon), with FSRS + SM-2 spaced repetition, streaks, and exam-date cram mode. 14-day free trial, then $14.99. To load a deck in the app: Settings → Library → Browse, then pick your exam and deck.

Download on the App Store

More Exam SRM decks:

Clustering & KNN Clustering & KNN Practice Decision Trees & Ensembles Decision Trees & Ensembles Practice Generalized Linear Models Practice Linear Regression

← All Exam SRM decks

Browse all 43 cards as a list

Link functions
Name the **three components** of a generalized linear model (GLM).
1. **Random component:** the response $Y$ has a distribution in the **linear-exponential family** (normal, binomial, Poisson, gamma, inverse Gaussian), with mean $\mu=E[Y]$. 2. **Systematic component:** a **linear predictor** $\eta=X\beta=\beta_0+\beta_1x_1+\cdots+\beta_px_p$. 3. **Link function** $g$: a monotonic, differentiable function tying the two together via $g(\mu)=\eta$, so $\mu=g^{-1}(\eta)$.
Exponential family
Write the **linear-exponential family** density and identify $\theta$ and $\phi$.
$f(y;\theta,\phi)=\exp\!\left\{\dfrac{y\theta-b(\theta)}{\phi}+c(y,\phi)\right\}$. $\theta$ is the **natural (canonical) parameter**, which depends on the mean. $\phi$ is the **dispersion parameter** (scale), often known or estimated separately. $b(\theta)$ is the **cumulant function** and $c(y,\phi)$ a normalizing term not involving $\theta$.
Exponential family
In the linear-exponential family, give the **mean** and **variance** of $Y$ in terms of $b(\theta)$.
$E[Y]=\mu=b'(\theta)$ — the mean is the first derivative of the cumulant function. $\text{Var}(Y)=\phi\,b''(\theta)$ — the variance is the dispersion times the second derivative. Writing $b''(\theta)$ as a function of $\mu$ gives the **variance function** $V(\mu)$, so $\text{Var}(Y)=\phi\,V(\mu)$.
Exponential family
What is the **variance function** $V(\mu)$ for the normal, Poisson, binomial, and gamma GLMs?
**Normal:** $V(\mu)=1$ (variance constant, $\phi=\sigma^{2}$). **Poisson:** $V(\mu)=\mu$ (variance equals the mean, $\phi=1$). **Binomial** (proportion form): $V(\mu)=\mu(1-\mu)$. **Gamma:** $V(\mu)=\mu^{2}$ (constant coefficient of variation). The variance function is the signature that identifies the random-component distribution.
Exponential family
Show that the **Poisson** distribution belongs to the linear-exponential family and read off $\theta$, $b(\theta)$, and $\phi$.
$f(y)=\dfrac{e^{-\mu}\mu^{y}}{y!}=\exp\{y\ln\mu-\mu-\ln y!\}$. Matching $\dfrac{y\theta-b(\theta)}{\phi}+c(y,\phi)$ with $\phi=1$: $\theta=\ln\mu$, so $\mu=e^{\theta}$ and $b(\theta)=e^{\theta}=\mu$. Check: $b'(\theta)=e^{\theta}=\mu$ ✓ (mean) and $b''(\theta)=e^{\theta}=\mu$, so $\text{Var}(Y)=\mu$ ✓. The canonical parameter is the **log mean**, which is why the **log link** is canonical for Poisson.
Link functions
Define the **link function** $g$ and the **canonical link**.
A link $g$ relates the mean to the linear predictor: $g(\mu)=\eta=X\beta$, with inverse (mean) function $\mu=g^{-1}(\eta)$. The **canonical link** sets the linear predictor equal to the natural parameter, $g(\mu)=\theta$. It gives convenient algebra (the sufficient statistic is $X^{\top}y$) but is not required — any monotone differentiable $g$ mapping to the correct range is admissible.
Link functions
List the **canonical link** for the normal, binomial, and Poisson GLMs.
**Normal:** identity, $g(\mu)=\mu$ — ordinary linear regression. **Binomial:** logit, $g(\mu)=\ln\dfrac{\mu}{1-\mu}$ (here $\mu=p$). **Poisson:** log, $g(\mu)=\ln\mu$. These are the defaults because each equals the distribution's natural parameter $\theta$.
Link functions
Why use a **link function** at all instead of regressing $\mu$ directly on $X\beta$?
Because $\eta=X\beta$ ranges over all of $\mathbb{R}$, but $\mu$ is often constrained: a probability must lie in $(0,1)$ and a count rate must be positive. The link maps the constrained mean onto the whole real line so the linear predictor never produces an impossible value. E.g. the **logit** maps $(0,1)\to(-\infty,\infty)$ and the **log** maps $(0,\infty)\to(-\infty,\infty)$; inverting always returns a legal mean.
Link functions
Name three common link functions for **binomial** GLMs and their inverse-link (mean) forms.
**Logit:** $\eta=\ln\dfrac{p}{1-p}$, inverse $p=\dfrac{1}{1+e^{-\eta}}$ (canonical). **Probit:** $\eta=\Phi^{-1}(p)$, inverse $p=\Phi(\eta)$, using the standard normal CDF. **Complementary log-log:** $\eta=\ln(-\ln(1-p))$, inverse $p=1-e^{-e^{\eta}}$ (asymmetric). Logit is canonical and gives interpretable odds ratios; probit and clog-log are alternatives with different tail behavior.
Logistic regression
Define the **logistic regression** model and state its link.
Logistic regression models a binary/Bernoulli response with $p=P(Y=1\mid x)$ via the **logit link**: $\ln\dfrac{p}{1-p}=\eta=\beta_0+\beta_1x_1+\cdots+\beta_px_p$. Inverting, the fitted probability is $\hat p=\dfrac{1}{1+e^{-\hat\eta}}=\dfrac{e^{\hat\eta}}{1+e^{\hat\eta}}$, the **logistic (sigmoid)** function, always in $(0,1)$.
Logistic regression
In logistic regression, define the **odds** and the **odds ratio** $e^{\beta_1}$.
The **odds** of the event are $\dfrac{p}{1-p}=e^{\eta}=e^{\beta_0+\beta_1x}$. Increasing $x$ by one unit multiplies the odds by $e^{\beta_1}$, so $e^{\beta_1}$ is the **odds ratio** per unit increase in $x$. $e^{\beta_1}>1$ means higher odds (positive effect); $e^{\beta_1}<1$ means lower odds; $e^{\beta_1}=1$ ($\beta_1=0$) means $x$ has no effect.
Logistic regression
A fitted logistic model gives $\hat\eta=-2.0+0.5x$. For an applicant with $x=4$, find the predicted **probability** $\hat p$.
Linear predictor: $\hat\eta=-2.0+0.5(4)=0.0$. $\hat p=\dfrac{1}{1+e^{-\hat\eta}}=\dfrac{1}{1+e^{0}}=\dfrac{1}{2}=0.5$. With $\hat\eta=0$ the odds are $e^{0}=1$, an even-money event, so $\hat p=0.5$ exactly.
Logistic regression
A logistic model gives $\hat\eta=-1.2+0.8x_1-0.3x_2$. For $x_1=3$, $x_2=2$, find the predicted probability.
$\hat\eta=-1.2+0.8(3)-0.3(2)=-1.2+2.4-0.6=0.6$. $e^{-0.6}\approx 0.548812$. $\hat p=\dfrac{1}{1+e^{-0.6}}=\dfrac{1}{1+0.548812}=\dfrac{1}{1.548812}\approx 0.6456$. So the model predicts about a $64.6\%$ chance of the event.
Logistic regression
A logistic regression of default on credit score has $\hat\beta_1=-0.04$ per point. Interpret the coefficient and give the odds ratio for a **10-point** increase.
Per one-point increase, the odds of default multiply by $e^{-0.04}\approx 0.9608$ — a $3.92\%$ drop in odds. For a $10$-point increase the odds multiply by $e^{-0.04\times 10}=e^{-0.4}\approx 0.6703$. That is, raising the score $10$ points cuts the odds of default by about $1-0.6703=33.0\%$.
Logistic regression
A logistic model has intercept $\hat\beta_0=-1.5$ and a smoker indicator coefficient $\hat\beta_1=0.9$. Compare the predicted **probabilities** for non-smokers and smokers.
**Non-smoker** ($x=0$): $\hat\eta=-1.5$, $\hat p=\dfrac{1}{1+e^{1.5}}=\dfrac{1}{1+4.481689}\approx 0.1824$. **Smoker** ($x=1$): $\hat\eta=-1.5+0.9=-0.6$, $\hat p=\dfrac{1}{1+e^{0.6}}=\dfrac{1}{1+1.822119}\approx 0.3543$. The odds ratio is $e^{0.9}\approx 2.460$: smokers' odds are about $2.46\times$ those of non-smokers (note the probability ratio $0.3543/0.1824\approx 1.94$ is smaller than the odds ratio).
Logistic regression
A logistic model predicts $\hat p=0.30$ for a policyholder. What are the predicted **odds** and **log-odds**?
**Odds** $=\dfrac{\hat p}{1-\hat p}=\dfrac{0.30}{0.70}\approx 0.4286$ (about $3$-to-$7$ in favor; equivalently $7$-to-$3$ against). **Log-odds (logit)** $=\ln(0.4286)\approx -0.8473$, which equals the linear predictor $\hat\eta$. Check: $\hat p=\dfrac{1}{1+e^{0.8473}}=\dfrac{1}{1+2.3333}=0.30$ ✓.
Poisson regression
Define **Poisson regression** and state its model equation with the canonical link.
Poisson regression models a **count** response $Y$ (claims, accidents) with mean $\mu>0$ using the **log link**: $\ln\mu=\eta=\beta_0+\beta_1x_1+\cdots+\beta_px_p$, so $\mu=e^{\eta}=e^{\beta_0}\,e^{\beta_1x_1}\cdots e^{\beta_px_p}$. The response is assumed $Y\sim\text{Poisson}(\mu)$, so $\text{Var}(Y)=\mu$ (mean equals variance).
Poisson regression
Why is the effect of a predictor **multiplicative** in Poisson (log-link) regression?
Because $\mu=e^{\beta_0}e^{\beta_1x_1}\cdots$, a one-unit increase in $x_j$ multiplies the mean by $e^{\beta_j}$ rather than adding to it. So $e^{\beta_j}$ is the **rate ratio** (multiplicative factor) per unit of $x_j$: $e^{\beta_j}>1$ raises the expected count, $e^{\beta_j}<1$ lowers it. This multiplicative structure matches insurance rating, where rate relativities multiply a base rate.
Poisson regression
A Poisson claim-count model is $\ln\hat\mu=-1.0+0.3x_1+0.7x_2$. For $x_1=2$, $x_2=1$, find the predicted **mean** count.
$\ln\hat\mu=-1.0+0.3(2)+0.7(1)=-1.0+0.6+0.7=0.3$. $\hat\mu=e^{0.3}\approx 1.3499$. The model predicts about $1.35$ expected claims for this risk.
Poisson regression
In a Poisson regression $\hat\beta_j=0.405$ for a risk factor. Interpret it as a **rate ratio**, and give the percent change in expected count.
The rate ratio is $e^{\hat\beta_j}=e^{0.405}\approx 1.4993\approx 1.50$. A one-unit increase in $x_j$ multiplies the expected count by about $1.50$ — a **50% increase** in the predicted mean, holding other predictors fixed. (If instead $\hat\beta_j=-0.405$, the factor is $e^{-0.405}\approx 0.667$, a $33.3\%$ decrease.)
Poisson regression
What is an **offset** (exposure term) in Poisson regression, and how does it enter the model?
When counts arise over differing **exposure** $E$ (policy-years, miles, population), we model the **rate** $\mu/E$. With a log link: $\ln\mu=\ln E+\beta_0+\beta_1x_1+\cdots$, i.e. $\mu=E\cdot e^{\beta_0+\beta_1x_1+\cdots}$. The term $\ln E$ is the **offset** — a predictor with coefficient fixed at $1$ (not estimated). It scales the expected count proportionally to exposure.
Poisson regression
A Poisson model with offset $\ln(\text{exposure})$ has $\ln\hat\mu=\ln E-2.0+0.5x$. A policy has exposure $E=3$ policy-years and $x=2$. Find the predicted **claim count**.
$\ln\hat\mu=\ln 3-2.0+0.5(2)=\ln 3-2.0+1.0=\ln 3-1.0$. $\hat\mu=3\cdot e^{-1.0}=3(0.367879)\approx 1.1036$. Equivalently the per-year rate is $e^{-1.0}\approx 0.3679$ claims/year, times $3$ years $\approx 1.10$ claims.
Poisson regression
Define **overdispersion** in a Poisson GLM and name a way to detect and accommodate it.
Overdispersion is when the data's variance **exceeds** the Poisson assumption $\text{Var}(Y)=\mu$, often because of unmodeled heterogeneity or clustering. **Detect:** the (scaled) Pearson or deviance statistic divided by its degrees of freedom is well above $1$. **Accommodate:** fit a **quasi-Poisson** model with an estimated dispersion $\phi>1$ (inflating standard errors by $\sqrt{\hat\phi}$), or switch to a **negative binomial** GLM. Ignoring it makes standard errors too small and significance overstated.
Poisson regression
Estimate the **dispersion** from a Poisson fit with Pearson statistic $X^{2}=180$ on $n-p=150$ degrees of freedom. Is there overdispersion?
$\hat\phi=\dfrac{X^{2}}{n-p}=\dfrac{180}{150}=1.2$. Since $\hat\phi=1.2>1$, there is mild **overdispersion**. Standard errors should be inflated by $\sqrt{1.2}\approx 1.095$, so a naive $z=2.0$ becomes about $2.0/1.095\approx 1.83$ — possibly no longer significant at $5\%$.
Estimation & deviance
How are GLM coefficients estimated? Name the method and the algorithm.
By **maximum likelihood**: choose $\hat\beta$ to maximize the log-likelihood $\ell(\beta)$, equivalently solving the score equations $\dfrac{\partial\ell}{\partial\beta}=0$. Except for the normal-identity case these have no closed form, so they are solved numerically by **iteratively reweighted least squares (IRLS)** — equivalent to Fisher scoring / Newton-Raphson with the expected information. Each iteration solves a weighted least-squares problem on an adjusted response until convergence.
Estimation & deviance
Why can't logistic and Poisson regression coefficients be found with ordinary least squares like normal regression?
OLS minimizes a sum of squared errors that coincides with the MLE only under **normal errors with constant variance**. In logistic/Poisson GLMs the variance depends on the mean ($p(1-p)$ or $\mu$) and the link is nonlinear, so the likelihood is not quadratic in $\beta$ and the score equations are nonlinear. The MLE must be found iteratively (IRLS), which reweights observations by their mean-dependent variances at each step.
Estimation & deviance
Define the **saturated model** and the **deviance** $D$ of a fitted GLM.
The **saturated model** has one parameter per observation, so it fits perfectly ($\hat\mu_i=y_i$) and gives the maximum achievable log-likelihood $\ell_{\text{sat}}$. The **deviance** measures lack of fit: $D=2\phi(\ell_{\text{sat}}-\ell_{\text{model}})$, or the **scaled deviance** $D^{*}=2(\ell_{\text{sat}}-\ell_{\text{model}})$ when $\phi=1$. Smaller deviance means the model is closer to the saturated fit; it generalizes the residual sum of squares.
Estimation & deviance
Define the **deviance residual** and the **Pearson residual** for a GLM.
**Deviance residual:** $r_i^{D}=\text{sign}(y_i-\hat\mu_i)\sqrt{d_i}$, where $d_i$ is observation $i$'s contribution to the deviance ($D=\sum_i d_i$). **Pearson residual:** $r_i^{P}=\dfrac{y_i-\hat\mu_i}{\sqrt{V(\hat\mu_i)}}$, the raw residual standardized by the estimated standard deviation (the variance function). The sum of squared Pearson residuals is the Pearson $X^{2}$ statistic; deviance residuals sum (of squares) to the deviance.
Estimation & deviance
Give the **deviance** formula for a Poisson GLM and compute the contribution from one point with $y_i=4$, $\hat\mu_i=2.5$.
Poisson deviance: $D=2\sum_i\left[y_i\ln\dfrac{y_i}{\hat\mu_i}-(y_i-\hat\mu_i)\right]$. For $y_i=4$, $\hat\mu_i=2.5$: $\dfrac{y_i}{\hat\mu_i}=1.6$, $\ln 1.6\approx 0.470004$. $d_i=2\left[4(0.470004)-(4-2.5)\right]=2\left[1.880016-1.5\right]=2(0.380016)\approx 0.7600$. Its deviance residual is $+\sqrt{0.7600}\approx +0.872$ (positive since $y_i>\hat\mu_i$).
Model comparison
State the **drop-in-deviance** (likelihood-ratio) test for comparing two **nested** GLMs.
Let the smaller (reduced) model be nested in the larger (full) model, differing by $q$ parameters. With $\phi$ known (e.g. $=1$): $\Delta D=D_{\text{reduced}}-D_{\text{full}}=2(\ell_{\text{full}}-\ell_{\text{reduced}})\sim\chi^{2}_{q}$ under $H_0$ (extra terms add nothing). Reject $H_0$ (keep the larger model) when $\Delta D$ exceeds the upper-$\alpha$ critical value $\chi^{2}_{q,\alpha}$. This is the GLM analog of the partial $F$-test.
Model comparison
Reduced model deviance is $60.0$ ($142$ df); adding $3$ predictors gives full-model deviance $48.0$ ($139$ df). Test whether the $3$ predictors jointly matter at $5\%$.
$\Delta D=60.0-48.0=12.0$ with $q=142-139=3$ df. Critical value $\chi^{2}_{3,\,0.05}=7.815$. Since $12.0>7.815$, **reject** $H_0$: at least one of the three added predictors is significant, so the larger model is preferred. (The $p$-value is $P(\chi^{2}_3>12.0)\approx 0.0074$.)
Model comparison
A model's deviance drops by $\Delta D=2.5$ when **one** extra predictor is added. At $5\%$, is the predictor significant?
$q=1$, so compare to $\chi^{2}_{1,\,0.05}=3.841$. Since $\Delta D=2.5<3.841$, **fail to reject** $H_0$: the extra predictor does **not** significantly improve fit, so the simpler model is preferred. (For a single coefficient this agrees with a Wald $z$ near $\sqrt{2.5}\approx 1.58$, below $1.96$.)
Model comparison
Define **AIC** and explain why it is used to compare **non-nested** GLMs.
$\text{AIC}=-2\ell(\hat\beta)+2k$, where $\ell$ is the maximized log-likelihood and $k$ the number of estimated parameters. The $-2\ell$ term rewards fit; the $+2k$ term penalizes complexity. **Lower AIC is better.** Unlike the drop-in-deviance test, AIC does not require nesting — it can rank models with different predictor sets (or even different link functions / distributions, if on the same data), trading off fit against parsimony.
Model comparison
Two non-nested GLMs on the same data: Model A has $\ell=-120.0$ with $k=4$; Model B has $\ell=-118.0$ with $k=7$. Compare by **AIC**.
$\text{AIC}_A=-2(-120.0)+2(4)=240.0+8=248.0$. $\text{AIC}_B=-2(-118.0)+2(7)=236.0+14=250.0$. Lower AIC wins, so **Model A** ($248.0<250.0$) is preferred: B's $4$-point gain in $-2\ell$ (from $240\to236$) doesn't justify its $3$ extra parameters (penalty $+6$).
Model comparison
Define **BIC** and contrast its complexity penalty with AIC's.
$\text{BIC}=-2\ell(\hat\beta)+k\ln n$, where $n$ is the sample size. **Lower BIC is better.** Vs AIC ($+2k$), BIC's penalty is $k\ln n$, which exceeds $2k$ whenever $\ln n>2$, i.e. $n>e^{2}\approx 7.4$. So for any realistic sample BIC penalizes extra parameters **more harshly**, tending to choose **smaller** models and being **consistent** (selects the true model as $n\to\infty$).
Model comparison
With $n=200$: Model A has $\ell=-120.0$, $k=4$; Model B has $\ell=-118.0$, $k=7$. Compare by **BIC**.
$\ln 200\approx 5.2983$. $\text{BIC}_A=-2(-120.0)+4(5.2983)=240.0+21.193=261.193$. $\text{BIC}_B=-2(-118.0)+7(5.2983)=236.0+37.088=273.088$. Lower BIC wins, so **Model A** ($261.19<273.09$) — BIC's heavier penalty makes the parsimonious model the even clearer choice than under AIC.
Model comparison
Two logistic models are compared by deviance. The null (intercept-only) deviance is $200.0$; the fitted model with $4$ predictors has deviance $176.0$. Test overall model significance at $5\%$.
This is a drop-in-deviance test of all $4$ slopes jointly: $\Delta D=200.0-176.0=24.0$ with $q=4$ df. Critical value $\chi^{2}_{4,\,0.05}=9.488$. Since $24.0>9.488$, **reject** $H_0:\beta_1=\cdots=\beta_4=0$ — the predictors collectively improve fit significantly. (This is the GLM analog of the overall $F$-test of regression.)
Logistic regression
In a logistic regression, a coefficient has estimate $\hat\beta_1=0.84$ with $SE(\hat\beta_1)=0.30$. Perform the **Wald test** and give the odds ratio with a $95\%$ confidence interval.
Wald statistic: $z=\dfrac{\hat\beta_1}{SE(\hat\beta_1)}=\dfrac{0.84}{0.30}=2.8$. Since $|2.8|>1.96$, $\beta_1$ is significant at $5\%$. Point odds ratio: $e^{0.84}\approx 2.316$. $95\%$ CI for $\beta_1$: $0.84\pm 1.96(0.30)=0.84\pm 0.588=(0.252,\,1.428)$. Exponentiate for the OR CI: $(e^{0.252},e^{1.428})\approx(1.287,\,4.170)$ — excludes $1$, consistent with significance.
Estimation & deviance
Compute the **deviance contribution** for a logistic (Bernoulli) observation with $y_i=1$ and fitted $\hat p_i=0.8$, and one with $y_i=0$, $\hat p_i=0.3$.
Bernoulli deviance contribution: $d_i=-2\left[y_i\ln\hat p_i+(1-y_i)\ln(1-\hat p_i)\right]$. For $y_i=1$, $\hat p_i=0.8$: $d_i=-2\ln(0.8)=-2(-0.223144)\approx 0.4463$. For $y_i=0$, $\hat p_i=0.3$: $d_i=-2\ln(1-0.3)=-2\ln(0.7)=-2(-0.356675)\approx 0.7133$. Well-fit points (predicted probability near the observed outcome) contribute small deviance; total deviance $=\sum_i d_i$.
Estimation & deviance
A grouped logistic regression has residual deviance $D=210$ on $195$ degrees of freedom. Use the deviance as a **goodness-of-fit** check at $5\%$.
For grouped binomial data, $D\approx\chi^{2}_{n-p}$ under good fit. Here compare $D=210$ to $\chi^{2}_{195}$. A quick rule: $D/\text{df}=210/195\approx 1.077\approx 1$, suggesting adequate fit. More precisely, $\chi^{2}_{195,\,0.05}\approx 228.6$; since $210<228.6$, we **do not reject** adequate fit. (For ungrouped Bernoulli data the deviance is **not** $\chi^2$-distributed, so this GOF use requires grouped/binomial counts.)
Logistic regression
Explain why **logistic regression coefficients are interpreted on the log-odds scale**, not as changes in probability.
The model is linear in the **logit**: $\ln\dfrac{p}{1-p}=\beta_0+\beta_1x$. So $\beta_1$ is the additive change in **log-odds** per unit $x$, and $e^{\beta_1}$ is the constant **odds ratio** — these are constant across $x$. The effect on the **probability** is **not** constant: $\dfrac{\partial p}{\partial x}=\beta_1\,p(1-p)$, which is largest near $p=0.5$ and shrinks toward $0$ at the extremes. Hence odds ratios are reported, while marginal probability effects depend on the baseline $p$.
Model comparison
Contrast the **deviance** and **AIC** as criteria — which compares nested models and which compares any models?
**Deviance / drop-in-deviance test:** a formal hypothesis test ($\chi^2_q$) for **nested** models; it asks whether extra terms significantly improve fit, but cannot rank models with different (non-overlapping) predictor sets. **AIC (and BIC):** information criteria that attach a complexity penalty to $-2\ell$ and can rank **any** models on the same data — nested or not — by trading fit against parsimony. AIC has no significance threshold; you simply pick the **lowest** value.
Poisson regression
A Poisson rating model gives base rate $e^{\hat\beta_0}$ and three multiplicative relativities. With $\hat\beta_0=-1.6$, urban factor $e^{0.40}$, young-driver factor $e^{0.25}$, and a discount $e^{-0.15}$, find the expected claim frequency for an urban young driver with the discount.
Multiplicative log-link structure: $\hat\mu=e^{\hat\beta_0}\cdot e^{0.40}\cdot e^{0.25}\cdot e^{-0.15}=e^{-1.6+0.40+0.25-0.15}=e^{-1.10}$. $\hat\mu=e^{-1.10}\approx 0.3329$ claims. Equivalently, base $e^{-1.6}\approx 0.2019$ times the combined relativity $e^{0.50}\approx 1.6487$ gives $0.2019\times 1.6487\approx 0.3329$.