{
  "deckName": "Exam MAS-II — Generalized Linear Models",
  "examCode": "Exam MAS-II",
  "cards": [
    {
      "front": "State the **exponential dispersion family (EDF)** density and name each piece.",
      "back": "$f(y;\\theta,\\phi)=\\exp\\!\\left\\{\\frac{y\\theta-b(\\theta)}{\\phi}+c(y,\\phi)\\right\\}$.\n$\\theta$ = **canonical (natural) parameter**, a function of the mean.\n$\\phi$ = **dispersion parameter** (scale).\n$b(\\theta)$ = **cumulant function**; its derivatives give the moments.\n$c(y,\\phi)$ = normalizing term not involving $\\theta$.\nThis single form contains the normal, Poisson, gamma, binomial, and inverse-Gaussian distributions.",
      "tag": "Exponential dispersion family"
    },
    {
      "front": "From the EDF, give the **mean** and **variance** of $Y$ in terms of $b(\\theta)$.",
      "back": "$\\mu=E[Y]=b'(\\theta)$ and $\\text{Var}(Y)=\\phi\\,b''(\\theta)$.\nSince $b''(\\theta)$ can be written as a function of the mean, define the **variance function** $V(\\mu)=b''(\\theta)$, so $\\text{Var}(Y)=\\phi\\,V(\\mu)$. The mean comes from the **first** derivative of the cumulant function and the variance from the **second**.",
      "tag": "Exponential dispersion family"
    },
    {
      "front": "Show that the **Poisson** distribution is in the EDF and identify $\\theta$, $b(\\theta)$, $\\phi$, and $V(\\mu)$.",
      "back": "$P(Y=y)=\\frac{e^{-\\mu}\\mu^{y}}{y!}=\\exp\\{y\\ln\\mu-\\mu-\\ln y!\\}$.\nMatching the EDF: $\\theta=\\ln\\mu$, so $\\mu=e^{\\theta}$ and $b(\\theta)=e^{\\theta}=\\mu$; $\\phi=1$; $c(y,\\phi)=-\\ln y!$.\nThen $b'(\\theta)=e^{\\theta}=\\mu$ (mean) and $b''(\\theta)=e^{\\theta}=\\mu$, so $V(\\mu)=\\mu$ and $\\text{Var}(Y)=\\mu$.",
      "tag": "Exponential dispersion family"
    },
    {
      "front": "Show the **normal** $N(\\mu,\\sigma^{2})$ distribution is in the EDF and identify $\\theta$, $b(\\theta)$, $\\phi$, $V(\\mu)$.",
      "back": "$f(y)=\\exp\\!\\left\\{\\frac{y\\mu-\\mu^{2}/2}{\\sigma^{2}}-\\frac{y^{2}}{2\\sigma^{2}}-\\frac{1}{2}\\ln(2\\pi\\sigma^{2})\\right\\}$.\nThus $\\theta=\\mu$, $b(\\theta)=\\frac{\\theta^{2}}{2}$, and $\\phi=\\sigma^{2}$.\n$b'(\\theta)=\\theta=\\mu$ (mean) and $b''(\\theta)=1$, so $V(\\mu)=1$ and $\\text{Var}(Y)=\\sigma^{2}\\cdot 1=\\sigma^{2}$ — the variance is constant, independent of the mean.",
      "tag": "Exponential dispersion family"
    },
    {
      "front": "Give the **variance function** $V(\\mu)$ for the normal, Poisson, gamma, binomial (proportion), and inverse-Gaussian distributions.",
      "back": "Normal: $V(\\mu)=1$.\nPoisson: $V(\\mu)=\\mu$.\nGamma: $V(\\mu)=\\mu^{2}$.\nBinomial proportion: $V(\\mu)=\\mu(1-\\mu)$.\nInverse Gaussian: $V(\\mu)=\\mu^{3}$.\nThe variance function is the GLM's fingerprint: it tells you how the response's spread scales with its mean and so dictates the distributional choice.",
      "tag": "Link & variance functions"
    },
    {
      "front": "Define the **Tweedie** family and explain why it suits aggregate insurance losses.",
      "back": "The Tweedie family has power variance function $V(\\mu)=\\mu^{p}$. Special cases: $p=0$ normal, $p=1$ Poisson, $p=2$ gamma, $p=3$ inverse Gaussian.\nFor $1<p<2$ the Tweedie is a **compound Poisson–gamma**: a point mass at zero (no claims) plus a continuous positive part (claim amounts). This exactly matches pure-premium data — many exact zeros mixed with positive aggregate losses — making one Tweedie GLM serve where frequency $\\times$ severity would otherwise be modeled separately.",
      "tag": "Link & variance functions"
    },
    {
      "front": "Name the **three components** of a generalized linear model.",
      "back": "1. **Random component:** the response $Y$ has a distribution from the exponential dispersion family with mean $\\mu$.\n2. **Systematic component (linear predictor):** $\\eta=X\\beta=\\beta_0+\\beta_1 x_1+\\cdots+\\beta_p x_p$, a linear combination of covariates.\n3. **Link function:** $g(\\mu)=\\eta$ connects the mean to the linear predictor, so $\\mu=g^{-1}(\\eta)$.\nOrdinary linear regression is the special case: normal random component, identity link.",
      "tag": "Link & variance functions"
    },
    {
      "front": "What is a **link function** $g$, and what must it satisfy?",
      "back": "The link relates the mean to the linear predictor: $g(\\mu)=\\eta=X\\beta$, so $\\mu=g^{-1}(\\eta)$.\nIt must be **monotonic and differentiable** so that $g^{-1}$ exists and maps the unbounded $\\eta\\in(-\\infty,\\infty)$ into the valid range of $\\mu$.\nExample: the **log link** $g(\\mu)=\\ln\\mu$ keeps fitted means positive ($\\mu=e^{\\eta}>0$), which is essential for counts and severities.",
      "tag": "Link & variance functions"
    },
    {
      "front": "Define the **canonical link** and give it for the normal, Poisson, gamma, and binomial.",
      "back": "The canonical link sets $\\eta=\\theta$, i.e. $g(\\mu)=\\theta(\\mu)$, the natural parameter as a function of the mean.\nNormal: identity, $g(\\mu)=\\mu$.\nPoisson: log, $g(\\mu)=\\ln\\mu$.\nGamma: inverse (reciprocal), $g(\\mu)=\\mu^{-1}$ (sometimes the log link is used instead for interpretability).\nBinomial: logit, $g(\\mu)=\\ln\\frac{\\mu}{1-\\mu}$.\nCanonical links give simpler estimating equations and guarantee a concave log-likelihood.",
      "tag": "Link & variance functions"
    },
    {
      "front": "Why is the **log link** the standard choice for insurance ratemaking?",
      "back": "With a log link $\\ln\\mu=\\beta_0+\\beta_1 x_1+\\cdots$, exponentiating gives a **multiplicative** model:\n$\\mu=e^{\\beta_0}e^{\\beta_1 x_1}\\cdots e^{\\beta_p x_p}$.\nThe base rate $e^{\\beta_0}$ is multiplied by a **relativity** $e^{\\beta_j}$ for each rating factor — exactly the structure of classification rating plans. It also forces $\\mu>0$, which is required for frequencies and severities, and keeps factors from producing negative premiums.",
      "tag": "Insurance applications"
    },
    {
      "front": "Define the **logit link** and the **logistic regression** model for a binary outcome.",
      "back": "For $Y\\in\\{0,1\\}$ with $\\mu=P(Y=1)$, the logit link is $g(\\mu)=\\ln\\frac{\\mu}{1-\\mu}=\\eta=X\\beta$.\nInverting, $\\mu=\\frac{e^{\\eta}}{1+e^{\\eta}}=\\frac{1}{1+e^{-\\eta}}$, a probability in $(0,1)$.\nThe quantity $\\frac{\\mu}{1-\\mu}$ is the **odds**; $e^{\\beta_j}$ is the **odds ratio** for a one-unit increase in $x_j$.",
      "tag": "Link & variance functions"
    },
    {
      "front": "What is an **offset** in a GLM and when is it used?",
      "back": "An offset is a covariate with its coefficient fixed at $1$: $\\eta=\\ln(\\text{exposure})+X\\beta$.\nWith a log link this models the **rate per unit exposure**: $\\mu=\\text{exposure}\\cdot e^{X\\beta}$, so the linear predictor explains the per-exposure rate while the count scales with exposure.\nUsed in Poisson frequency models where exposure (e.g. car-years, policy-months) varies by record; the offset $\\ln(\\text{exposure})$ enters with no estimated coefficient.",
      "tag": "Insurance applications"
    },
    {
      "front": "A Poisson GLM for claim frequency has $\\ln\\mu=-2.0+0.4\\,x_{\\text{urban}}$ with an offset $\\ln E$ for exposure $E$. For an urban policy with $E=2.5$ car-years, find the **expected number of claims**.",
      "back": "Linear predictor: $\\eta=\\ln E + (-2.0)+0.4(1)=\\ln 2.5 -1.6$.\n$\\ln 2.5\\approx 0.91629$, so $\\eta\\approx 0.91629-1.6=-0.68371$.\nExpected claims $=e^{\\eta}=e^{-0.68371}\\approx 0.5047$.\nEquivalently $\\mu=E\\cdot e^{-2.0}e^{0.4}=2.5(0.135335)(1.491825)\\approx 2.5(0.201897)\\approx 0.5047$ claims.",
      "tag": "Insurance applications"
    },
    {
      "front": "A log-link severity GLM has $\\ln\\mu=6.2+0.30\\,x_1-0.15\\,x_2$. Find the predicted mean severity for $x_1=1,\\;x_2=0$, and the relativity for $x_1$.",
      "back": "Linear predictor: $\\eta=6.2+0.30(1)-0.15(0)=6.5$.\nPredicted mean: $\\mu=e^{6.5}\\approx 665.14$.\nRelativity for $x_1$: $e^{0.30}\\approx 1.3499$ — a one-unit increase in $x_1$ multiplies the expected severity by about $1.35$, a $35\\%$ increase, holding $x_2$ fixed.",
      "tag": "Insurance applications"
    },
    {
      "front": "How do you interpret a **log-link coefficient** $\\beta_j$ as a multiplicative effect?",
      "back": "Under $\\ln\\mu=\\beta_0+\\sum\\beta_j x_j$, increasing $x_j$ by one unit multiplies the mean by $e^{\\beta_j}$:\n$\\frac{\\mu_{\\text{new}}}{\\mu_{\\text{old}}}=e^{\\beta_j}$.\nSo $e^{\\beta_j}$ is the **relativity**; $(e^{\\beta_j}-1)\\times 100\\%$ is the percent change. Example: $\\beta_j=0.182$ gives $e^{0.182}\\approx 1.20$, a $20\\%$ increase per unit of $x_j$.",
      "tag": "Insurance applications"
    },
    {
      "front": "A classification log-link GLM has base level $e^{\\beta_0}=300$, a Region B relativity $e^{\\beta_{\\text{B}}}=1.25$, and a Class 2 relativity $e^{\\beta_{\\text{2}}}=0.80$. Find the predicted pure premium for a **Region B, Class 2** policy.",
      "back": "Multiplicative model: $\\mu=e^{\\beta_0}\\cdot e^{\\beta_{\\text{B}}}\\cdot e^{\\beta_{\\text{2}}}=300(1.25)(0.80)$.\n$300\\times 1.25=375$; $375\\times 0.80=300$.\nPredicted pure premium $=\\$300$. The two relativities offset each other, returning the base. This is the standard multiplicative rating-plan structure a log link produces.",
      "tag": "Insurance applications"
    },
    {
      "front": "In a **logistic regression** for lapse, $\\ln\\frac{\\mu}{1-\\mu}=-1.5+0.8\\,x$. For $x=1$, find the predicted lapse probability and the odds ratio for $x$.",
      "back": "Linear predictor: $\\eta=-1.5+0.8(1)=-0.7$.\nProbability: $\\mu=\\frac{1}{1+e^{-\\eta}}=\\frac{1}{1+e^{0.7}}=\\frac{1}{1+2.01375}=\\frac{1}{3.01375}\\approx 0.3318$.\nOdds ratio for a one-unit rise in $x$: $e^{0.8}\\approx 2.2255$ — the odds of lapsing multiply by about $2.23$.",
      "tag": "Insurance applications"
    },
    {
      "front": "How are GLM coefficients **estimated**, and why not ordinary least squares?",
      "back": "GLM coefficients are estimated by **maximum likelihood** — choose $\\beta$ to maximize the EDF log-likelihood.\nOLS minimizes squared error, which is the MLE **only** for the normal/identity case. For non-normal responses (Poisson, gamma, binomial) the variance depends on the mean and the link is nonlinear, so the likelihood equations have no closed form and must be solved numerically.\nThe standard algorithm is **iteratively reweighted least squares (IRLS)**.",
      "tag": "Estimation & IRLS"
    },
    {
      "front": "Describe the **IRLS** algorithm for fitting a GLM.",
      "back": "Iteratively reweighted least squares is Fisher scoring applied to the GLM score equations. Each iteration:\n1. Form the **adjusted response** $z_i=\\eta_i+(y_i-\\mu_i)g'(\\mu_i)$.\n2. Form weights $w_i=\\frac{1}{V(\\mu_i)\\,[g'(\\mu_i)]^{2}}$.\n3. Solve the **weighted least squares** problem regressing $z$ on $X$ with weights $w$ to update $\\beta$.\n4. Recompute $\\eta=X\\beta$, $\\mu=g^{-1}(\\eta)$, and the weights; repeat until $\\beta$ converges.\nThe weights and working response are recomputed every step — hence \"iteratively reweighted.\"",
      "tag": "Estimation & IRLS"
    },
    {
      "front": "Write the **score (likelihood) equations** that the MLE of $\\beta$ solves in a GLM.",
      "back": "For each coefficient $j$, $\\sum_{i=1}^{n}\\frac{(y_i-\\mu_i)\\,x_{ij}}{V(\\mu_i)\\,g'(\\mu_i)}=0$ (the $\\phi$ cancels).\nThese are nonlinear in $\\beta$ because $\\mu_i=g^{-1}(x_i^{\\top}\\beta)$. Under the **canonical link** they simplify to $X^{\\top}y=X^{\\top}\\mu$, i.e. the model reproduces the covariate-weighted totals of the data — for an intercept that forces $\\sum y_i=\\sum\\mu_i$.",
      "tag": "Estimation & IRLS"
    },
    {
      "front": "For a Poisson GLM with **only an intercept** and log link, show that $\\hat\\mu=\\bar y$. Data: $y=(2,4,3,5,6)$.",
      "back": "Canonical-link score equation forces $\\sum y_i=\\sum\\mu_i$. With one parameter every fitted value equals $\\hat\\mu$, so $\\sum y_i=n\\hat\\mu\\Rightarrow\\hat\\mu=\\bar y$.\n$\\sum y_i=2+4+3+5+6=20$, $n=5$, so $\\hat\\mu=\\frac{20}{5}=4$.\nThe fitted intercept is $\\hat\\beta_0=\\ln 4\\approx 1.3863$ since $e^{\\hat\\beta_0}=\\hat\\mu=4$.",
      "tag": "Estimation & IRLS"
    },
    {
      "front": "Why does the **dispersion parameter** $\\phi$ not affect the point estimates $\\hat\\beta$, but does affect their standard errors?",
      "back": "$\\phi$ cancels out of the score equations $\\sum\\frac{(y_i-\\mu_i)x_{ij}}{V(\\mu_i)g'(\\mu_i)}=0$, so the location of the maximum — the $\\hat\\beta$ — does not depend on $\\phi$.\nBut the covariance of $\\hat\\beta$ is $\\widehat{\\text{Cov}}(\\hat\\beta)=\\phi\\,(X^{\\top}WX)^{-1}$, which scales with $\\phi$. So an under- or over-estimated $\\phi$ leaves the fitted means unchanged but distorts standard errors, confidence intervals, and significance tests.",
      "tag": "Estimation & IRLS"
    },
    {
      "front": "Define the **deviance** $D$ of a fitted GLM.",
      "back": "$D=2\\phi\\,(\\ell_{\\text{sat}}-\\ell_{\\text{model}})$, where $\\ell_{\\text{sat}}$ is the maximized log-likelihood of the **saturated model** (one parameter per observation, $\\hat\\mu_i=y_i$) and $\\ell_{\\text{model}}$ is that of the fitted model.\nThe deviance is a likelihood-ratio measure of how far the fitted model is from a perfect fit — the GLM analogue of the residual sum of squares. Smaller deviance means better fit.",
      "tag": "Deviance & residuals"
    },
    {
      "front": "Distinguish **deviance** $D$ from **scaled deviance** $D^{*}$.",
      "back": "The (unscaled) **deviance** is $D=2(\\ell_{\\text{sat}}-\\ell_{\\text{model}})\\,\\phi$ expressed in the data's units — for normal data $D=\\sum(y_i-\\hat\\mu_i)^2$.\nThe **scaled deviance** is $D^{*}=\\frac{D}{\\phi}=2(\\ell_{\\text{sat}}-\\ell_{\\text{model}})$, dimensionless.\nIt is $D^{*}$ that is (asymptotically) $\\chi^{2}$ distributed. When $\\phi=1$ (Poisson, binomial) the two coincide.",
      "tag": "Deviance & residuals"
    },
    {
      "front": "Give the **deviance contribution** for a Poisson observation and compute the total deviance. Data $y=(3,5,2)$, fitted $\\hat\\mu=(4,4,4)$.",
      "back": "Poisson deviance: $D=2\\sum\\left[y_i\\ln\\frac{y_i}{\\hat\\mu_i}-(y_i-\\hat\\mu_i)\\right]$ (with $\\phi=1$).\nTerm 1: $3\\ln\\frac{3}{4}-(3-4)=3(-0.287682)+1=0.137$ (i.e. $-0.863+1$).\nTerm 2: $5\\ln\\frac{5}{4}-(5-4)=5(0.223144)-1=1.116-1=0.116$.\nTerm 3: $2\\ln\\frac{2}{4}-(2-4)=2(-0.693147)+2=-1.386+2=0.614$.\n$D=2(0.137+0.116+0.614)=2(0.867)\\approx 1.733$.",
      "tag": "Deviance & residuals"
    },
    {
      "front": "Give the **Pearson** $\\chi^{2}$ statistic and the **Pearson residual**.",
      "back": "Pearson residual: $r_i^{P}=\\frac{y_i-\\hat\\mu_i}{\\sqrt{V(\\hat\\mu_i)}}$.\nPearson $\\chi^{2}=\\sum_{i=1}^{n}\\frac{(y_i-\\hat\\mu_i)^{2}}{V(\\hat\\mu_i)}=\\sum (r_i^{P})^{2}$.\nIt is an alternative to the deviance for assessing fit; under a correct model the scaled version $\\frac{X^{2}}{\\phi}$ is approximately $\\chi^{2}_{n-p}$.",
      "tag": "Deviance & residuals"
    },
    {
      "front": "Compute the **Pearson** $\\chi^{2}$ for a Poisson fit with $y=(3,5,2)$, $\\hat\\mu=(4,4,4)$.",
      "back": "For Poisson, $V(\\hat\\mu_i)=\\hat\\mu_i$, so $X^{2}=\\sum\\frac{(y_i-\\hat\\mu_i)^{2}}{\\hat\\mu_i}$.\n$\\frac{(3-4)^2}{4}=\\frac{1}{4}=0.25$.\n$\\frac{(5-4)^2}{4}=\\frac{1}{4}=0.25$.\n$\\frac{(2-4)^2}{4}=\\frac{4}{4}=1.00$.\n$X^{2}=0.25+0.25+1.00=1.50$.\nThis is close to the deviance $1.733$ from the same data — the two fit measures usually agree.",
      "tag": "Deviance & residuals"
    },
    {
      "front": "Define the **deviance residual** $r_i^{D}$ and how it relates to $D$.",
      "back": "$r_i^{D}=\\text{sign}(y_i-\\hat\\mu_i)\\sqrt{d_i}$, where $d_i\\ge 0$ is observation $i$'s contribution to the deviance, so $D=\\sum_i d_i=\\sum_i (r_i^{D})^{2}$.\nDeviance residuals are usually closer to normality than Pearson residuals, making them the preferred residual for diagnostic plots. Each one shows how much a single observation worsens the overall fit, signed by direction.",
      "tag": "Deviance & residuals"
    },
    {
      "front": "How is the **deviance** used to test **nested models** (the drop-in-deviance test)?",
      "back": "For a smaller model $M_0$ nested in a larger model $M_1$ with $\\phi$ known (e.g. $\\phi=1$), under $H_0$ (the extra terms are zero):\n$\\Delta D^{*}=D^{*}_{0}-D^{*}_{1}\\;\\dot\\sim\\;\\chi^{2}_{\\;\\Delta p}$,\nwhere $\\Delta p$ is the number of extra parameters in $M_1$. Reject $H_0$ (keep the larger model) when $\\Delta D^{*}$ exceeds the $\\chi^{2}_{\\Delta p}$ critical value. This is a likelihood-ratio test in deviance form.",
      "tag": "Model selection & diagnostics"
    },
    {
      "front": "A Poisson GLM ($\\phi=1$): reduced model deviance $D_0=45.2$ on $30$ df; full model deviance $D_1=33.0$ on $27$ df. Test the $3$ added terms at $\\alpha=0.05$ ($\\chi^{2}_{3,0.95}=7.815$).",
      "back": "Drop-in-deviance statistic: $\\Delta D=D_0-D_1=45.2-33.0=12.2$.\nDegrees of freedom: $\\Delta p=30-27=3$.\nCompare $12.2$ to $\\chi^{2}_{3,0.95}=7.815$. Since $12.2>7.815$, **reject** $H_0$: the three added predictors significantly improve the fit, so keep the fuller model.",
      "tag": "Model selection & diagnostics"
    },
    {
      "front": "When $\\phi$ must be **estimated** (e.g. gamma, normal), how do you compare nested GLMs?",
      "back": "Use an **F-test** instead of the $\\chi^{2}$ drop-in-deviance, because dividing by an estimated $\\phi$ introduces extra sampling variability:\n$F=\\frac{(D_0-D_1)/\\Delta p}{D_1/(n-p_1)}=\\frac{(D_0-D_1)/(p_1-p_0)}{\\hat\\phi}$,\nwhere $\\hat\\phi=\\frac{D_1}{n-p_1}$ (or the Pearson estimate). Compare to $F_{\\Delta p,\\;n-p_1}$. Large $F$ favors the larger model.",
      "tag": "Model selection & diagnostics"
    },
    {
      "front": "A gamma GLM: $D_0=120$ ($p_0=4$), $D_1=96$ ($p_1=7$), $n=50$. Carry out the F-test for the $3$ extra terms ($F_{3,43,0.95}\\approx 2.82$).",
      "back": "Estimate dispersion from the full model: $\\hat\\phi=\\frac{D_1}{n-p_1}=\\frac{96}{50-7}=\\frac{96}{43}\\approx 2.2326$.\nNumerator: $\\frac{D_0-D_1}{p_1-p_0}=\\frac{120-96}{3}=\\frac{24}{3}=8$.\n$F=\\frac{8}{2.2326}\\approx 3.583$.\nSince $3.583>F_{3,43,0.95}\\approx 2.82$, **reject** $H_0$: the three added covariates significantly improve the gamma model.",
      "tag": "Model selection & diagnostics"
    },
    {
      "front": "Define **AIC** and **BIC** and state how to use them in GLM selection.",
      "back": "$\\text{AIC}=-2\\ell(\\hat\\beta)+2p$ and $\\text{BIC}=-2\\ell(\\hat\\beta)+p\\ln n$, where $p$ is the number of estimated parameters and $n$ the sample size.\nBoth reward fit (small $-2\\ell$) and penalize complexity; **lower is better**. BIC's penalty $\\ln n$ exceeds AIC's $2$ once $n>7.4$, so BIC favors **smaller** models. Unlike the drop-in-deviance test, they can compare **non-nested** models.",
      "tag": "Model selection & diagnostics"
    },
    {
      "front": "Model A has $-2\\ell=210$ with $p=5$; Model B has $-2\\ell=204$ with $p=9$; $n=100$. Pick the better model by **AIC** and by **BIC**.",
      "back": "**AIC:** $A=210+2(5)=220$; $B=204+2(9)=222$. Model **A** wins (lower AIC).\n**BIC** ($\\ln 100\\approx 4.60517$): $A=210+5(4.60517)=210+23.03=233.03$; $B=204+9(4.60517)=204+41.45=245.45$. Model **A** wins again, by a wider margin since BIC penalizes B's extra parameters harder.\nThe $6$-point likelihood gain from B does not justify its $4$ extra parameters.",
      "tag": "Model selection & diagnostics"
    },
    {
      "front": "What is **overdispersion**, how do you detect it, and why does it matter?",
      "back": "Overdispersion is when the data's variance exceeds the value the assumed distribution implies — e.g. a Poisson model where $\\text{Var}(Y)>\\mu$. A quick check: the ratio $\\frac{X^{2}}{n-p}$ (or $\\frac{D}{n-p}$) is well above $1$.\nIf ignored it leaves $\\hat\\beta$ unchanged but makes **standard errors too small**, so terms look more significant than they are. Remedies: an estimated dispersion $\\hat\\phi$ (quasi-Poisson) or a different distribution (negative binomial).",
      "tag": "Model selection & diagnostics"
    },
    {
      "front": "A Poisson GLM has Pearson $X^{2}=180$ on $n-p=60$ degrees of freedom. Estimate the **overdispersion factor** and adjust the standard errors.",
      "back": "Dispersion estimate: $\\hat\\phi=\\frac{X^{2}}{n-p}=\\frac{180}{60}=3.0$, well above $1$ — clear overdispersion.\nThe quasi-Poisson correction multiplies the naive covariance by $\\hat\\phi$, so each standard error is inflated by $\\sqrt{\\hat\\phi}=\\sqrt{3}\\approx 1.732$.\nA coefficient with naive SE $0.10$ should be reported with SE $\\approx 0.173$; its $z$-statistic shrinks by the same factor $1.732$, tempering false significance.",
      "tag": "Model selection & diagnostics"
    },
    {
      "front": "What is **quasi-likelihood**, and when is it useful?",
      "back": "Quasi-likelihood specifies only a **mean–variance relationship** $\\text{Var}(Y)=\\phi\\,V(\\mu)$ and a link, without naming a full EDF distribution. The quasi-score equations are the same as a GLM's, $\\sum\\frac{(y_i-\\mu_i)x_{ij}}{\\phi V(\\mu_i)g'(\\mu_i)}=0$, so $\\hat\\beta$ is found the same way.\nIt is used for **overdispersion** (quasi-Poisson with $V(\\mu)=\\mu$ but $\\phi$ estimated $>1$) when no exact distribution fits, giving valid estimates and corrected standard errors.",
      "tag": "Estimation & IRLS"
    },
    {
      "front": "Estimate $\\phi$ two ways for a gamma GLM. Deviance $D=86.4$, Pearson $X^{2}=78.0$, $n=44$, $p=4$.",
      "back": "Residual degrees of freedom: $n-p=44-4=40$.\n**Deviance estimate:** $\\hat\\phi_D=\\frac{D}{n-p}=\\frac{86.4}{40}=2.16$.\n**Pearson estimate:** $\\hat\\phi_P=\\frac{X^{2}}{n-p}=\\frac{78.0}{40}=1.95$.\nThe Pearson estimate is the more common default for the gamma dispersion. Both feed into $\\widehat{\\text{Cov}}(\\hat\\beta)=\\hat\\phi(X^{\\top}WX)^{-1}$ for standard errors.",
      "tag": "Estimation & IRLS"
    },
    {
      "front": "Use the **Wald test** to assess a single coefficient. A GLM gives $\\hat\\beta_1=0.52$ with standard error $0.18$. Test $H_0:\\beta_1=0$ at $\\alpha=0.05$.",
      "back": "Wald statistic: $z=\\frac{\\hat\\beta_1}{\\text{SE}}=\\frac{0.52}{0.18}\\approx 2.889$.\nCompare to the standard-normal critical value $z_{0.975}=1.96$. Since $2.889>1.96$, **reject** $H_0$ — $\\beta_1$ is significantly nonzero.\nApprox. $95\\%$ CI for $\\beta_1$: $0.52\\pm 1.96(0.18)=0.52\\pm 0.3528=(0.167,\\,0.873)$; exponentiating gives a relativity CI of $(e^{0.167},e^{0.873})\\approx(1.18,\\,2.39)$.",
      "tag": "Model selection & diagnostics"
    },
    {
      "front": "Why are **deviance/AIC differences** preferred over $R^{2}$ for comparing GLMs?",
      "back": "Ordinary $R^{2}$ is built on the normal-model residual sum of squares and has no clean meaning when the variance changes with the mean and the link is nonlinear. GLM fit is instead judged by **likelihood-based** measures: deviance (likelihood-ratio fit), drop-in-deviance / F-tests for nested comparisons, and AIC/BIC for penalized comparison (including non-nested models).\nPseudo-$R^{2}$ measures exist but are descriptive only; inference rests on the deviance and information criteria.",
      "tag": "Model selection & diagnostics"
    },
    {
      "front": "In a gamma severity GLM with **log link**, the residual deviance for the **null** (intercept-only) model is $58.0$ and for the **fitted** model is $40.0$. Comment on how much variation the predictors explain.",
      "back": "The drop in deviance from adding predictors is $58.0-40.0=18.0$ on $\\Delta p$ degrees of freedom — a deviance-based analogue of the explained sum of squares.\nA \"deviance pseudo-$R^{2}$\" is $1-\\frac{40.0}{58.0}=1-0.6897\\approx 0.310$, i.e. the covariates account for about $31\\%$ of the null deviance. Whether the $18.0$ drop is significant is judged by a drop-in-deviance F-test (gamma $\\phi$ is estimated), not by the percentage alone.",
      "tag": "Model selection & diagnostics"
    },
    {
      "front": "Why does the **gamma GLM with log link** pair naturally with the **Poisson GLM with log link** in a frequency–severity pure-premium model?",
      "back": "Frequency (counts per exposure) is modeled with a **Poisson log-link** GLM (variance $\\propto\\mu$, offset $\\ln$ exposure); severity (average cost per claim) is modeled with a **gamma log-link** GLM (variance $\\propto\\mu^{2}$, so larger claims are noisier — a realistic right-skewed cost shape).\nBoth being multiplicative, the **pure premium** is their product: $\\widehat{\\text{PP}}=\\hat\\mu_{\\text{freq}}\\times\\hat\\mu_{\\text{sev}}$, and the combined relativity for a factor is the product of its frequency and severity relativities.",
      "tag": "Insurance applications"
    },
    {
      "front": "A multiplicative rating model has frequency relativity $1.30$ and severity relativity $0.90$ for \"young driver,\" with a base pure premium of $\\$500$. Find the young-driver pure premium and its overall relativity.",
      "back": "Overall pure-premium relativity $=$ frequency relativity $\\times$ severity relativity $=1.30\\times 0.90=1.17$.\nYoung-driver pure premium $=\\$500\\times 1.17=\\$585$.\nSo despite a $10\\%$ lower average severity, the $30\\%$ higher frequency dominates, giving a net $17\\%$ surcharge — exactly the product rule a log-link (multiplicative) GLM enforces.",
      "tag": "Insurance applications"
    },
    {
      "front": "Compare the **canonical link** with a **non-canonical** link choice in practice (e.g. gamma severity).",
      "back": "The gamma's canonical link is the **inverse** $g(\\mu)=\\mu^{-1}$, which guarantees a concave log-likelihood and slightly simpler estimating equations.\nBut actuaries usually fit gamma severities with the **log link** instead: it keeps $\\mu>0$, gives interpretable **multiplicative relativities** $e^{\\beta_j}$, and matches the rating-plan structure. The exam point: the canonical link is mathematically convenient, but the link should be chosen for the application — log link for multiplicative pricing regardless of canonical status.",
      "tag": "Link & variance functions"
    }
  ]
}