{
  "deckName": "Exam SRM — Generalized Linear Models",
  "examCode": "Exam SRM",
  "cards": [
    {
      "front": "Name the **three components** of a generalized linear model (GLM).",
      "back": "1. **Random component:** the response $Y$ has a distribution in the **linear-exponential family** (normal, binomial, Poisson, gamma, inverse Gaussian), with mean $\\mu=E[Y]$.\n2. **Systematic component:** a **linear predictor** $\\eta=X\\beta=\\beta_0+\\beta_1x_1+\\cdots+\\beta_px_p$.\n3. **Link function** $g$: a monotonic, differentiable function tying the two together via $g(\\mu)=\\eta$, so $\\mu=g^{-1}(\\eta)$.",
      "tag": "Link functions"
    },
    {
      "front": "Write the **linear-exponential family** density and identify $\\theta$ and $\\phi$.",
      "back": "$f(y;\\theta,\\phi)=\\exp\\!\\left\\{\\dfrac{y\\theta-b(\\theta)}{\\phi}+c(y,\\phi)\\right\\}$.\n$\\theta$ is the **natural (canonical) parameter**, which depends on the mean.\n$\\phi$ is the **dispersion parameter** (scale), often known or estimated separately.\n$b(\\theta)$ is the **cumulant function** and $c(y,\\phi)$ a normalizing term not involving $\\theta$.",
      "tag": "Exponential family"
    },
    {
      "front": "In the linear-exponential family, give the **mean** and **variance** of $Y$ in terms of $b(\\theta)$.",
      "back": "$E[Y]=\\mu=b'(\\theta)$ — the mean is the first derivative of the cumulant function.\n$\\text{Var}(Y)=\\phi\\,b''(\\theta)$ — the variance is the dispersion times the second derivative.\nWriting $b''(\\theta)$ as a function of $\\mu$ gives the **variance function** $V(\\mu)$, so $\\text{Var}(Y)=\\phi\\,V(\\mu)$.",
      "tag": "Exponential family"
    },
    {
      "front": "What is the **variance function** $V(\\mu)$ for the normal, Poisson, binomial, and gamma GLMs?",
      "back": "**Normal:** $V(\\mu)=1$ (variance constant, $\\phi=\\sigma^{2}$).\n**Poisson:** $V(\\mu)=\\mu$ (variance equals the mean, $\\phi=1$).\n**Binomial** (proportion form): $V(\\mu)=\\mu(1-\\mu)$.\n**Gamma:** $V(\\mu)=\\mu^{2}$ (constant coefficient of variation).\nThe variance function is the signature that identifies the random-component distribution.",
      "tag": "Exponential family"
    },
    {
      "front": "Show that the **Poisson** distribution belongs to the linear-exponential family and read off $\\theta$, $b(\\theta)$, and $\\phi$.",
      "back": "$f(y)=\\dfrac{e^{-\\mu}\\mu^{y}}{y!}=\\exp\\{y\\ln\\mu-\\mu-\\ln y!\\}$.\nMatching $\\dfrac{y\\theta-b(\\theta)}{\\phi}+c(y,\\phi)$ with $\\phi=1$: $\\theta=\\ln\\mu$, so $\\mu=e^{\\theta}$ and $b(\\theta)=e^{\\theta}=\\mu$.\nCheck: $b'(\\theta)=e^{\\theta}=\\mu$ ✓ (mean) and $b''(\\theta)=e^{\\theta}=\\mu$, so $\\text{Var}(Y)=\\mu$ ✓. The canonical parameter is the **log mean**, which is why the **log link** is canonical for Poisson.",
      "tag": "Exponential family"
    },
    {
      "front": "Define the **link function** $g$ and the **canonical link**.",
      "back": "A link $g$ relates the mean to the linear predictor: $g(\\mu)=\\eta=X\\beta$, with inverse (mean) function $\\mu=g^{-1}(\\eta)$.\nThe **canonical link** sets the linear predictor equal to the natural parameter, $g(\\mu)=\\theta$. It gives convenient algebra (the sufficient statistic is $X^{\\top}y$) but is not required — any monotone differentiable $g$ mapping to the correct range is admissible.",
      "tag": "Link functions"
    },
    {
      "front": "List the **canonical link** for the normal, binomial, and Poisson GLMs.",
      "back": "**Normal:** identity, $g(\\mu)=\\mu$ — ordinary linear regression.\n**Binomial:** logit, $g(\\mu)=\\ln\\dfrac{\\mu}{1-\\mu}$ (here $\\mu=p$).\n**Poisson:** log, $g(\\mu)=\\ln\\mu$.\nThese are the defaults because each equals the distribution's natural parameter $\\theta$.",
      "tag": "Link functions"
    },
    {
      "front": "Why use a **link function** at all instead of regressing $\\mu$ directly on $X\\beta$?",
      "back": "Because $\\eta=X\\beta$ ranges over all of $\\mathbb{R}$, but $\\mu$ is often constrained: a probability must lie in $(0,1)$ and a count rate must be positive. The link maps the constrained mean onto the whole real line so the linear predictor never produces an impossible value.\nE.g. the **logit** maps $(0,1)\\to(-\\infty,\\infty)$ and the **log** maps $(0,\\infty)\\to(-\\infty,\\infty)$; inverting always returns a legal mean.",
      "tag": "Link functions"
    },
    {
      "front": "Name three common link functions for **binomial** GLMs and their inverse-link (mean) forms.",
      "back": "**Logit:** $\\eta=\\ln\\dfrac{p}{1-p}$, inverse $p=\\dfrac{1}{1+e^{-\\eta}}$ (canonical).\n**Probit:** $\\eta=\\Phi^{-1}(p)$, inverse $p=\\Phi(\\eta)$, using the standard normal CDF.\n**Complementary log-log:** $\\eta=\\ln(-\\ln(1-p))$, inverse $p=1-e^{-e^{\\eta}}$ (asymmetric).\nLogit is canonical and gives interpretable odds ratios; probit and clog-log are alternatives with different tail behavior.",
      "tag": "Link functions"
    },
    {
      "front": "Define the **logistic regression** model and state its link.",
      "back": "Logistic regression models a binary/Bernoulli response with $p=P(Y=1\\mid x)$ via the **logit link**:\n$\\ln\\dfrac{p}{1-p}=\\eta=\\beta_0+\\beta_1x_1+\\cdots+\\beta_px_p$.\nInverting, the fitted probability is $\\hat p=\\dfrac{1}{1+e^{-\\hat\\eta}}=\\dfrac{e^{\\hat\\eta}}{1+e^{\\hat\\eta}}$, the **logistic (sigmoid)** function, always in $(0,1)$.",
      "tag": "Logistic regression"
    },
    {
      "front": "In logistic regression, define the **odds** and the **odds ratio** $e^{\\beta_1}$.",
      "back": "The **odds** of the event are $\\dfrac{p}{1-p}=e^{\\eta}=e^{\\beta_0+\\beta_1x}$.\nIncreasing $x$ by one unit multiplies the odds by $e^{\\beta_1}$, so $e^{\\beta_1}$ is the **odds ratio** per unit increase in $x$.\n$e^{\\beta_1}>1$ means higher odds (positive effect); $e^{\\beta_1}<1$ means lower odds; $e^{\\beta_1}=1$ ($\\beta_1=0$) means $x$ has no effect.",
      "tag": "Logistic regression"
    },
    {
      "front": "A fitted logistic model gives $\\hat\\eta=-2.0+0.5x$. For an applicant with $x=4$, find the predicted **probability** $\\hat p$.",
      "back": "Linear predictor: $\\hat\\eta=-2.0+0.5(4)=0.0$.\n$\\hat p=\\dfrac{1}{1+e^{-\\hat\\eta}}=\\dfrac{1}{1+e^{0}}=\\dfrac{1}{2}=0.5$.\nWith $\\hat\\eta=0$ the odds are $e^{0}=1$, an even-money event, so $\\hat p=0.5$ exactly.",
      "tag": "Logistic regression"
    },
    {
      "front": "A logistic model gives $\\hat\\eta=-1.2+0.8x_1-0.3x_2$. For $x_1=3$, $x_2=2$, find the predicted probability.",
      "back": "$\\hat\\eta=-1.2+0.8(3)-0.3(2)=-1.2+2.4-0.6=0.6$.\n$e^{-0.6}\\approx 0.548812$.\n$\\hat p=\\dfrac{1}{1+e^{-0.6}}=\\dfrac{1}{1+0.548812}=\\dfrac{1}{1.548812}\\approx 0.6456$.\nSo the model predicts about a $64.6\\%$ chance of the event.",
      "tag": "Logistic regression"
    },
    {
      "front": "A logistic regression of default on credit score has $\\hat\\beta_1=-0.04$ per point. Interpret the coefficient and give the odds ratio for a **10-point** increase.",
      "back": "Per one-point increase, the odds of default multiply by $e^{-0.04}\\approx 0.9608$ — a $3.92\\%$ drop in odds.\nFor a $10$-point increase the odds multiply by $e^{-0.04\\times 10}=e^{-0.4}\\approx 0.6703$.\nThat is, raising the score $10$ points cuts the odds of default by about $1-0.6703=33.0\\%$.",
      "tag": "Logistic regression"
    },
    {
      "front": "A logistic model has intercept $\\hat\\beta_0=-1.5$ and a smoker indicator coefficient $\\hat\\beta_1=0.9$. Compare the predicted **probabilities** for non-smokers and smokers.",
      "back": "**Non-smoker** ($x=0$): $\\hat\\eta=-1.5$, $\\hat p=\\dfrac{1}{1+e^{1.5}}=\\dfrac{1}{1+4.481689}\\approx 0.1824$.\n**Smoker** ($x=1$): $\\hat\\eta=-1.5+0.9=-0.6$, $\\hat p=\\dfrac{1}{1+e^{0.6}}=\\dfrac{1}{1+1.822119}\\approx 0.3543$.\nThe odds ratio is $e^{0.9}\\approx 2.460$: smokers' odds are about $2.46\\times$ those of non-smokers (note the probability ratio $0.3543/0.1824\\approx 1.94$ is smaller than the odds ratio).",
      "tag": "Logistic regression"
    },
    {
      "front": "A logistic model predicts $\\hat p=0.30$ for a policyholder. What are the predicted **odds** and **log-odds**?",
      "back": "**Odds** $=\\dfrac{\\hat p}{1-\\hat p}=\\dfrac{0.30}{0.70}\\approx 0.4286$ (about $3$-to-$7$ in favor; equivalently $7$-to-$3$ against).\n**Log-odds (logit)** $=\\ln(0.4286)\\approx -0.8473$, which equals the linear predictor $\\hat\\eta$.\nCheck: $\\hat p=\\dfrac{1}{1+e^{0.8473}}=\\dfrac{1}{1+2.3333}=0.30$ ✓.",
      "tag": "Logistic regression"
    },
    {
      "front": "Define **Poisson regression** and state its model equation with the canonical link.",
      "back": "Poisson regression models a **count** response $Y$ (claims, accidents) with mean $\\mu>0$ using the **log link**:\n$\\ln\\mu=\\eta=\\beta_0+\\beta_1x_1+\\cdots+\\beta_px_p$, so $\\mu=e^{\\eta}=e^{\\beta_0}\\,e^{\\beta_1x_1}\\cdots e^{\\beta_px_p}$.\nThe response is assumed $Y\\sim\\text{Poisson}(\\mu)$, so $\\text{Var}(Y)=\\mu$ (mean equals variance).",
      "tag": "Poisson regression"
    },
    {
      "front": "Why is the effect of a predictor **multiplicative** in Poisson (log-link) regression?",
      "back": "Because $\\mu=e^{\\beta_0}e^{\\beta_1x_1}\\cdots$, a one-unit increase in $x_j$ multiplies the mean by $e^{\\beta_j}$ rather than adding to it.\nSo $e^{\\beta_j}$ is the **rate ratio** (multiplicative factor) per unit of $x_j$: $e^{\\beta_j}>1$ raises the expected count, $e^{\\beta_j}<1$ lowers it. This multiplicative structure matches insurance rating, where rate relativities multiply a base rate.",
      "tag": "Poisson regression"
    },
    {
      "front": "A Poisson claim-count model is $\\ln\\hat\\mu=-1.0+0.3x_1+0.7x_2$. For $x_1=2$, $x_2=1$, find the predicted **mean** count.",
      "back": "$\\ln\\hat\\mu=-1.0+0.3(2)+0.7(1)=-1.0+0.6+0.7=0.3$.\n$\\hat\\mu=e^{0.3}\\approx 1.3499$.\nThe model predicts about $1.35$ expected claims for this risk.",
      "tag": "Poisson regression"
    },
    {
      "front": "In a Poisson regression $\\hat\\beta_j=0.405$ for a risk factor. Interpret it as a **rate ratio**, and give the percent change in expected count.",
      "back": "The rate ratio is $e^{\\hat\\beta_j}=e^{0.405}\\approx 1.4993\\approx 1.50$.\nA one-unit increase in $x_j$ multiplies the expected count by about $1.50$ — a **50% increase** in the predicted mean, holding other predictors fixed.\n(If instead $\\hat\\beta_j=-0.405$, the factor is $e^{-0.405}\\approx 0.667$, a $33.3\\%$ decrease.)",
      "tag": "Poisson regression"
    },
    {
      "front": "What is an **offset** (exposure term) in Poisson regression, and how does it enter the model?",
      "back": "When counts arise over differing **exposure** $E$ (policy-years, miles, population), we model the **rate** $\\mu/E$. With a log link:\n$\\ln\\mu=\\ln E+\\beta_0+\\beta_1x_1+\\cdots$, i.e. $\\mu=E\\cdot e^{\\beta_0+\\beta_1x_1+\\cdots}$.\nThe term $\\ln E$ is the **offset** — a predictor with coefficient fixed at $1$ (not estimated). It scales the expected count proportionally to exposure.",
      "tag": "Poisson regression"
    },
    {
      "front": "A Poisson model with offset $\\ln(\\text{exposure})$ has $\\ln\\hat\\mu=\\ln E-2.0+0.5x$. A policy has exposure $E=3$ policy-years and $x=2$. Find the predicted **claim count**.",
      "back": "$\\ln\\hat\\mu=\\ln 3-2.0+0.5(2)=\\ln 3-2.0+1.0=\\ln 3-1.0$.\n$\\hat\\mu=3\\cdot e^{-1.0}=3(0.367879)\\approx 1.1036$.\nEquivalently the per-year rate is $e^{-1.0}\\approx 0.3679$ claims/year, times $3$ years $\\approx 1.10$ claims.",
      "tag": "Poisson regression"
    },
    {
      "front": "Define **overdispersion** in a Poisson GLM and name a way to detect and accommodate it.",
      "back": "Overdispersion is when the data's variance **exceeds** the Poisson assumption $\\text{Var}(Y)=\\mu$, often because of unmodeled heterogeneity or clustering.\n**Detect:** the (scaled) Pearson or deviance statistic divided by its degrees of freedom is well above $1$.\n**Accommodate:** fit a **quasi-Poisson** model with an estimated dispersion $\\phi>1$ (inflating standard errors by $\\sqrt{\\hat\\phi}$), or switch to a **negative binomial** GLM. Ignoring it makes standard errors too small and significance overstated.",
      "tag": "Poisson regression"
    },
    {
      "front": "Estimate the **dispersion** from a Poisson fit with Pearson statistic $X^{2}=180$ on $n-p=150$ degrees of freedom. Is there overdispersion?",
      "back": "$\\hat\\phi=\\dfrac{X^{2}}{n-p}=\\dfrac{180}{150}=1.2$.\nSince $\\hat\\phi=1.2>1$, there is mild **overdispersion**. Standard errors should be inflated by $\\sqrt{1.2}\\approx 1.095$, so a naive $z=2.0$ becomes about $2.0/1.095\\approx 1.83$ — possibly no longer significant at $5\\%$.",
      "tag": "Poisson regression"
    },
    {
      "front": "How are GLM coefficients estimated? Name the method and the algorithm.",
      "back": "By **maximum likelihood**: choose $\\hat\\beta$ to maximize the log-likelihood $\\ell(\\beta)$, equivalently solving the score equations $\\dfrac{\\partial\\ell}{\\partial\\beta}=0$.\nExcept for the normal-identity case these have no closed form, so they are solved numerically by **iteratively reweighted least squares (IRLS)** — equivalent to Fisher scoring / Newton-Raphson with the expected information. Each iteration solves a weighted least-squares problem on an adjusted response until convergence.",
      "tag": "Estimation & deviance"
    },
    {
      "front": "Why can't logistic and Poisson regression coefficients be found with ordinary least squares like normal regression?",
      "back": "OLS minimizes a sum of squared errors that coincides with the MLE only under **normal errors with constant variance**. In logistic/Poisson GLMs the variance depends on the mean ($p(1-p)$ or $\\mu$) and the link is nonlinear, so the likelihood is not quadratic in $\\beta$ and the score equations are nonlinear.\nThe MLE must be found iteratively (IRLS), which reweights observations by their mean-dependent variances at each step.",
      "tag": "Estimation & deviance"
    },
    {
      "front": "Define the **saturated model** and the **deviance** $D$ of a fitted GLM.",
      "back": "The **saturated model** has one parameter per observation, so it fits perfectly ($\\hat\\mu_i=y_i$) and gives the maximum achievable log-likelihood $\\ell_{\\text{sat}}$.\nThe **deviance** measures lack of fit: $D=2\\phi(\\ell_{\\text{sat}}-\\ell_{\\text{model}})$, or the **scaled deviance** $D^{*}=2(\\ell_{\\text{sat}}-\\ell_{\\text{model}})$ when $\\phi=1$.\nSmaller deviance means the model is closer to the saturated fit; it generalizes the residual sum of squares.",
      "tag": "Estimation & deviance"
    },
    {
      "front": "Define the **deviance residual** and the **Pearson residual** for a GLM.",
      "back": "**Deviance residual:** $r_i^{D}=\\text{sign}(y_i-\\hat\\mu_i)\\sqrt{d_i}$, where $d_i$ is observation $i$'s contribution to the deviance ($D=\\sum_i d_i$).\n**Pearson residual:** $r_i^{P}=\\dfrac{y_i-\\hat\\mu_i}{\\sqrt{V(\\hat\\mu_i)}}$, the raw residual standardized by the estimated standard deviation (the variance function).\nThe sum of squared Pearson residuals is the Pearson $X^{2}$ statistic; deviance residuals sum (of squares) to the deviance.",
      "tag": "Estimation & deviance"
    },
    {
      "front": "Give the **deviance** formula for a Poisson GLM and compute the contribution from one point with $y_i=4$, $\\hat\\mu_i=2.5$.",
      "back": "Poisson deviance: $D=2\\sum_i\\left[y_i\\ln\\dfrac{y_i}{\\hat\\mu_i}-(y_i-\\hat\\mu_i)\\right]$.\nFor $y_i=4$, $\\hat\\mu_i=2.5$: $\\dfrac{y_i}{\\hat\\mu_i}=1.6$, $\\ln 1.6\\approx 0.470004$.\n$d_i=2\\left[4(0.470004)-(4-2.5)\\right]=2\\left[1.880016-1.5\\right]=2(0.380016)\\approx 0.7600$.\nIts deviance residual is $+\\sqrt{0.7600}\\approx +0.872$ (positive since $y_i>\\hat\\mu_i$).",
      "tag": "Estimation & deviance"
    },
    {
      "front": "State the **drop-in-deviance** (likelihood-ratio) test for comparing two **nested** GLMs.",
      "back": "Let the smaller (reduced) model be nested in the larger (full) model, differing by $q$ parameters. With $\\phi$ known (e.g. $=1$):\n$\\Delta D=D_{\\text{reduced}}-D_{\\text{full}}=2(\\ell_{\\text{full}}-\\ell_{\\text{reduced}})\\sim\\chi^{2}_{q}$ under $H_0$ (extra terms add nothing).\nReject $H_0$ (keep the larger model) when $\\Delta D$ exceeds the upper-$\\alpha$ critical value $\\chi^{2}_{q,\\alpha}$. This is the GLM analog of the partial $F$-test.",
      "tag": "Model comparison"
    },
    {
      "front": "Reduced model deviance is $60.0$ ($142$ df); adding $3$ predictors gives full-model deviance $48.0$ ($139$ df). Test whether the $3$ predictors jointly matter at $5\\%$.",
      "back": "$\\Delta D=60.0-48.0=12.0$ with $q=142-139=3$ df.\nCritical value $\\chi^{2}_{3,\\,0.05}=7.815$.\nSince $12.0>7.815$, **reject** $H_0$: at least one of the three added predictors is significant, so the larger model is preferred.\n(The $p$-value is $P(\\chi^{2}_3>12.0)\\approx 0.0074$.)",
      "tag": "Model comparison"
    },
    {
      "front": "A model's deviance drops by $\\Delta D=2.5$ when **one** extra predictor is added. At $5\\%$, is the predictor significant?",
      "back": "$q=1$, so compare to $\\chi^{2}_{1,\\,0.05}=3.841$.\nSince $\\Delta D=2.5<3.841$, **fail to reject** $H_0$: the extra predictor does **not** significantly improve fit, so the simpler model is preferred.\n(For a single coefficient this agrees with a Wald $z$ near $\\sqrt{2.5}\\approx 1.58$, below $1.96$.)",
      "tag": "Model comparison"
    },
    {
      "front": "Define **AIC** and explain why it is used to compare **non-nested** GLMs.",
      "back": "$\\text{AIC}=-2\\ell(\\hat\\beta)+2k$, where $\\ell$ is the maximized log-likelihood and $k$ the number of estimated parameters.\nThe $-2\\ell$ term rewards fit; the $+2k$ term penalizes complexity. **Lower AIC is better.**\nUnlike the drop-in-deviance test, AIC does not require nesting — it can rank models with different predictor sets (or even different link functions / distributions, if on the same data), trading off fit against parsimony.",
      "tag": "Model comparison"
    },
    {
      "front": "Two non-nested GLMs on the same data: Model A has $\\ell=-120.0$ with $k=4$; Model B has $\\ell=-118.0$ with $k=7$. Compare by **AIC**.",
      "back": "$\\text{AIC}_A=-2(-120.0)+2(4)=240.0+8=248.0$.\n$\\text{AIC}_B=-2(-118.0)+2(7)=236.0+14=250.0$.\nLower AIC wins, so **Model A** ($248.0<250.0$) is preferred: B's $4$-point gain in $-2\\ell$ (from $240\\to236$) doesn't justify its $3$ extra parameters (penalty $+6$).",
      "tag": "Model comparison"
    },
    {
      "front": "Define **BIC** and contrast its complexity penalty with AIC's.",
      "back": "$\\text{BIC}=-2\\ell(\\hat\\beta)+k\\ln n$, where $n$ is the sample size. **Lower BIC is better.**\nVs AIC ($+2k$), BIC's penalty is $k\\ln n$, which exceeds $2k$ whenever $\\ln n>2$, i.e. $n>e^{2}\\approx 7.4$. So for any realistic sample BIC penalizes extra parameters **more harshly**, tending to choose **smaller** models and being **consistent** (selects the true model as $n\\to\\infty$).",
      "tag": "Model comparison"
    },
    {
      "front": "With $n=200$: Model A has $\\ell=-120.0$, $k=4$; Model B has $\\ell=-118.0$, $k=7$. Compare by **BIC**.",
      "back": "$\\ln 200\\approx 5.2983$.\n$\\text{BIC}_A=-2(-120.0)+4(5.2983)=240.0+21.193=261.193$.\n$\\text{BIC}_B=-2(-118.0)+7(5.2983)=236.0+37.088=273.088$.\nLower BIC wins, so **Model A** ($261.19<273.09$) — BIC's heavier penalty makes the parsimonious model the even clearer choice than under AIC.",
      "tag": "Model comparison"
    },
    {
      "front": "Two logistic models are compared by deviance. The null (intercept-only) deviance is $200.0$; the fitted model with $4$ predictors has deviance $176.0$. Test overall model significance at $5\\%$.",
      "back": "This is a drop-in-deviance test of all $4$ slopes jointly: $\\Delta D=200.0-176.0=24.0$ with $q=4$ df.\nCritical value $\\chi^{2}_{4,\\,0.05}=9.488$.\nSince $24.0>9.488$, **reject** $H_0:\\beta_1=\\cdots=\\beta_4=0$ — the predictors collectively improve fit significantly. (This is the GLM analog of the overall $F$-test of regression.)",
      "tag": "Model comparison"
    },
    {
      "front": "In a logistic regression, a coefficient has estimate $\\hat\\beta_1=0.84$ with $SE(\\hat\\beta_1)=0.30$. Perform the **Wald test** and give the odds ratio with a $95\\%$ confidence interval.",
      "back": "Wald statistic: $z=\\dfrac{\\hat\\beta_1}{SE(\\hat\\beta_1)}=\\dfrac{0.84}{0.30}=2.8$. Since $|2.8|>1.96$, $\\beta_1$ is significant at $5\\%$.\nPoint odds ratio: $e^{0.84}\\approx 2.316$.\n$95\\%$ CI for $\\beta_1$: $0.84\\pm 1.96(0.30)=0.84\\pm 0.588=(0.252,\\,1.428)$.\nExponentiate for the OR CI: $(e^{0.252},e^{1.428})\\approx(1.287,\\,4.170)$ — excludes $1$, consistent with significance.",
      "tag": "Logistic regression"
    },
    {
      "front": "Compute the **deviance contribution** for a logistic (Bernoulli) observation with $y_i=1$ and fitted $\\hat p_i=0.8$, and one with $y_i=0$, $\\hat p_i=0.3$.",
      "back": "Bernoulli deviance contribution: $d_i=-2\\left[y_i\\ln\\hat p_i+(1-y_i)\\ln(1-\\hat p_i)\\right]$.\nFor $y_i=1$, $\\hat p_i=0.8$: $d_i=-2\\ln(0.8)=-2(-0.223144)\\approx 0.4463$.\nFor $y_i=0$, $\\hat p_i=0.3$: $d_i=-2\\ln(1-0.3)=-2\\ln(0.7)=-2(-0.356675)\\approx 0.7133$.\nWell-fit points (predicted probability near the observed outcome) contribute small deviance; total deviance $=\\sum_i d_i$.",
      "tag": "Estimation & deviance"
    },
    {
      "front": "A grouped logistic regression has residual deviance $D=210$ on $195$ degrees of freedom. Use the deviance as a **goodness-of-fit** check at $5\\%$.",
      "back": "For grouped binomial data, $D\\approx\\chi^{2}_{n-p}$ under good fit. Here compare $D=210$ to $\\chi^{2}_{195}$.\nA quick rule: $D/\\text{df}=210/195\\approx 1.077\\approx 1$, suggesting adequate fit. More precisely, $\\chi^{2}_{195,\\,0.05}\\approx 228.6$; since $210<228.6$, we **do not reject** adequate fit.\n(For ungrouped Bernoulli data the deviance is **not** $\\chi^2$-distributed, so this GOF use requires grouped/binomial counts.)",
      "tag": "Estimation & deviance"
    },
    {
      "front": "Explain why **logistic regression coefficients are interpreted on the log-odds scale**, not as changes in probability.",
      "back": "The model is linear in the **logit**: $\\ln\\dfrac{p}{1-p}=\\beta_0+\\beta_1x$. So $\\beta_1$ is the additive change in **log-odds** per unit $x$, and $e^{\\beta_1}$ is the constant **odds ratio** — these are constant across $x$.\nThe effect on the **probability** is **not** constant: $\\dfrac{\\partial p}{\\partial x}=\\beta_1\\,p(1-p)$, which is largest near $p=0.5$ and shrinks toward $0$ at the extremes. Hence odds ratios are reported, while marginal probability effects depend on the baseline $p$.",
      "tag": "Logistic regression"
    },
    {
      "front": "Contrast the **deviance** and **AIC** as criteria — which compares nested models and which compares any models?",
      "back": "**Deviance / drop-in-deviance test:** a formal hypothesis test ($\\chi^2_q$) for **nested** models; it asks whether extra terms significantly improve fit, but cannot rank models with different (non-overlapping) predictor sets.\n**AIC (and BIC):** information criteria that attach a complexity penalty to $-2\\ell$ and can rank **any** models on the same data — nested or not — by trading fit against parsimony. AIC has no significance threshold; you simply pick the **lowest** value.",
      "tag": "Model comparison"
    },
    {
      "front": "A Poisson rating model gives base rate $e^{\\hat\\beta_0}$ and three multiplicative relativities. With $\\hat\\beta_0=-1.6$, urban factor $e^{0.40}$, young-driver factor $e^{0.25}$, and a discount $e^{-0.15}$, find the expected claim frequency for an urban young driver with the discount.",
      "back": "Multiplicative log-link structure: $\\hat\\mu=e^{\\hat\\beta_0}\\cdot e^{0.40}\\cdot e^{0.25}\\cdot e^{-0.15}=e^{-1.6+0.40+0.25-0.15}=e^{-1.10}$.\n$\\hat\\mu=e^{-1.10}\\approx 0.3329$ claims.\nEquivalently, base $e^{-1.6}\\approx 0.2019$ times the combined relativity $e^{0.50}\\approx 1.6487$ gives $0.2019\\times 1.6487\\approx 0.3329$.",
      "tag": "Poisson regression"
    }
  ]
}