Exam MAS-II — Linear Mixed Models Flashcards

Linear mixed models for CAS Exam MAS-II: the general form $y=X\beta+Zu+\epsilon$ with fixed and random effects, random-intercept and random-slope models, variance components and the intraclass correlation, ML vs REML estimation, the BLUP as a shrinkage estimator, and the exact equivalence between random-effects shrinkage and Bühlmann credibility — with fully worked ICC, BLUP, and variance-component examples.

42 cards6 topicsFree · fact-checked · LaTeX math

Tap card or press Space to flip

Answer

Import this deck

Download all 42 cards and import them into your flashcard app (JSON or CSV — works with Anki). Using the Willys app? No import needed — this deck is already built in (Settings → Library → Browse).

Download JSON Download CSV

Every deck is built into the Willys app

All of these decks — including the full practice problem banks — come built into Willys AI Flashcards & Quizzes for iPhone & iPad (Mac version coming soon), with FSRS + SM-2 spaced repetition, streaks, and exam-date cram mode. 14-day free trial, then $14.99. To load a deck in the app: Settings → Library → Browse, then pick your exam and deck.

Download on the App Store

More Exam MAS-II decks:

Bayesian Analysis Bayesian Analysis Practice Credibility Credibility Practice Generalized Linear Models Generalized Linear Models Practice

← All Exam MAS-II decks

Browse all 42 cards as a list

Fixed vs random effects
Write the **general linear mixed model** in matrix form and name every piece.
$y = X\beta + Zu + \epsilon$. $y$ = response vector; $X$ = design matrix for the **fixed effects** $\beta$ (population-level coefficients we estimate). $Z$ = design matrix for the **random effects** $u$, with $u\sim N(0,G)$. $\epsilon\sim N(0,R)$ = residual errors, independent of $u$. The "mixed" name comes from having both fixed effects $\beta$ and random effects $u$ in one model.
Variance components
Given $y=X\beta+Zu+\epsilon$ with $u\sim N(0,G)$, $\epsilon\sim N(0,R)$, $u\perp\epsilon$, derive $E(y)$ and $\text{Var}(y)$.
Since $E(u)=0$ and $E(\epsilon)=0$: $E(y)=X\beta$ — only the fixed effects shift the mean. $\text{Var}(y)=Z\,\text{Var}(u)\,Z^{\top} + \text{Var}(\epsilon) = ZGZ^{\top}+R$. The random effects induce **correlation** among observations that share a level of $u$ (through $ZGZ^{\top}$); the model is marginally $y\sim N(X\beta,\;ZGZ^{\top}+R)$.
Fixed vs random effects
When should a factor be modeled as a **fixed** effect versus a **random** effect?
**Fixed:** the levels in the data are the only ones of interest (e.g. treatment vs control, male/female). You estimate a separate parameter for each level and inference is conditional on exactly those levels. **Random:** the observed levels are a **sample from a larger population** of levels (e.g. 50 randomly chosen policyholders, schools, regions) and you want to generalize to the population and **borrow strength** across levels. You estimate the *variance* of the level effects, not each effect individually.
Fixed vs random effects
What does it mean that random effects let you "borrow strength" / "partially pool"?
A fixed-effects model fits each group independently (**no pooling**): small groups get noisy estimates. A single grand mean ignores groups (**complete pooling**). A random-effects model **partially pools**: each group's estimate is pulled from its own noisy mean toward the overall mean by an amount that depends on how much data the group has and how much groups truly vary. Small/noisy groups are shrunk more, so estimates are more stable than no-pooling and more flexible than complete pooling.
Random intercept & slope
Write the **random-intercept model** for observation $j$ in group $i$ and state the distributional assumptions.
$y_{ij}=\beta_0 + u_i + \epsilon_{ij}$, for groups $i=1,\dots,m$ and observations $j=1,\dots,n_i$. $u_i\sim N(0,\sigma_u^2)$ are the i.i.d. group-level random intercepts; $\epsilon_{ij}\sim N(0,\sigma_e^2)$ are i.i.d. residuals; $u_i\perp\epsilon_{ij}$. Group $i$'s mean is $\beta_0+u_i$; the fixed $\beta_0$ is the population mean, and $u_i$ is that group's random deviation from it.
Random intercept & slope
In a random-intercept model with a covariate, $y_{ij}=\beta_0+\beta_1 x_{ij}+u_i+\epsilon_{ij}$, what does each random intercept $u_i$ do geometrically?
All groups share the **same slope** $\beta_1$, but each group's regression line is shifted **up or down** by $u_i$ — a family of parallel lines, one per group. The fixed line is $\beta_0+\beta_1 x$; group $i$'s line is $(\beta_0+u_i)+\beta_1 x$. Only the intercept varies randomly across groups; the response-to-$x$ relationship is common.
Random intercept & slope
Write a **random-slope (and intercept)** model and describe what varies across groups.
$y_{ij}=\beta_0 + \beta_1 x_{ij} + u_{0i} + u_{1i}x_{ij} + \epsilon_{ij}$, where $\begin{pmatrix}u_{0i}\\ u_{1i}\end{pmatrix}\sim N\!\left(0,\,G\right)$ with $G=\begin{pmatrix}\sigma_{u0}^2 & \sigma_{u01}\\ \sigma_{u01} & \sigma_{u1}^2\end{pmatrix}$. Now both the **intercept** ($\beta_0+u_{0i}$) and the **slope** ($\beta_1+u_{1i}$) vary by group, so groups can differ in level *and* in how strongly $y$ responds to $x$. The covariance $\sigma_{u01}$ lets high-intercept groups tend to have higher (or lower) slopes.
Random intercept & slope
In the random-slope covariance matrix $G=\begin{pmatrix}\sigma_{u0}^2 & \sigma_{u01}\\ \sigma_{u01} & \sigma_{u1}^2\end{pmatrix}$, what does a positive $\sigma_{u01}$ mean, and how is the intercept–slope correlation found?
A **positive** $\sigma_{u01}$ means groups with a higher-than-average intercept also tend to have a higher-than-average slope (their lines fan out upward). The random-effects correlation is $\text{corr}(u_{0i},u_{1i})=\dfrac{\sigma_{u01}}{\sigma_{u0}\,\sigma_{u1}}$. E.g. $\sigma_{u0}^2=4$, $\sigma_{u1}^2=0.25$, $\sigma_{u01}=0.6$ gives $\dfrac{0.6}{\sqrt{4}\sqrt{0.25}}=\dfrac{0.6}{2(0.5)}=0.6$.
Variance components
What are the **variance components** of a random-intercept model, and what does each capture?
$\sigma_u^2$ = **between-group** variance — how much the true group means $\beta_0+u_i$ scatter around the population mean. $\sigma_e^2$ = **within-group** (residual) variance — scatter of observations around their own group mean. The total variance of a single observation is $\text{Var}(y_{ij})=\sigma_u^2+\sigma_e^2$. "Estimating variance components" means estimating $\sigma_u^2$ and $\sigma_e^2$ (rather than each $u_i$).
Variance components
For the random-intercept model $y_{ij}=\beta_0+u_i+\epsilon_{ij}$, derive $\text{Var}(y_{ij})$, $\text{Cov}(y_{ij},y_{ik})$ for $j\neq k$, and $\text{Cov}$ across different groups.
Same group, same obs: $\text{Var}(y_{ij})=\sigma_u^2+\sigma_e^2$. Same group, different obs ($j\neq k$): they share $u_i$, so $\text{Cov}(y_{ij},y_{ik})=\sigma_u^2$. Different groups ($i\neq i'$): independent $u$'s and $\epsilon$'s, so $\text{Cov}(y_{ij},y_{i'k})=0$. Thus within a group the responses are **equicorrelated** with correlation $\dfrac{\sigma_u^2}{\sigma_u^2+\sigma_e^2}$ (the ICC).
ICC & credibility link
Define the **intraclass correlation coefficient (ICC)** for a random-intercept model and interpret it.
$\rho = \dfrac{\sigma_u^2}{\sigma_u^2 + \sigma_e^2}$. It is the correlation between any two observations in the **same** group, and equivalently the **proportion of total variance that is between groups**. $\rho\to 0$: groups are essentially alike (random effect adds little); $\rho\to 1$: almost all variation is between groups (observations within a group are nearly identical). High ICC means group membership matters a lot.
ICC & credibility link
Compute the ICC: a random-intercept model gives $\hat\sigma_u^2 = 18$ (between-group) and $\hat\sigma_e^2 = 42$ (within-group). Interpret.
$\rho = \dfrac{\sigma_u^2}{\sigma_u^2+\sigma_e^2} = \dfrac{18}{18+42} = \dfrac{18}{60} = 0.30$. So $30\%$ of the total variation in $y$ is **between** groups and $70\%$ is within groups, and any two observations from the same group correlate $0.30$. A moderate ICC — group membership explains a meaningful but minority share of variability.
ICC & credibility link
Compute the ICC: in a multilevel claims model the policyholder-level variance is $\hat\sigma_u^2 = 0.045$ and the residual variance is $\hat\sigma_e^2 = 0.255$.
$\rho = \dfrac{0.045}{0.045+0.255} = \dfrac{0.045}{0.300} = 0.15$. Only $15\%$ of the variance sits between policyholders; the within-policyholder correlation of repeated observations is $0.15$. Low-to-moderate ICC — pooling across policyholders will be fairly aggressive (each policyholder's own data gets relatively little credibility).
Variance components
A study reports total variance $\hat\sigma_u^2+\hat\sigma_e^2 = 50$ and an ICC of $\rho = 0.36$. Recover the two variance components.
$\rho=\dfrac{\sigma_u^2}{\sigma_u^2+\sigma_e^2}$, so $\sigma_u^2 = \rho\,(\sigma_u^2+\sigma_e^2) = 0.36(50)=18$. Then $\sigma_e^2 = 50 - 18 = 32$. Check: $\dfrac{18}{18+32}=\dfrac{18}{50}=0.36$. ✓
Variance components
How does the **design (Z) matrix** encode a random intercept for a 2-group, 4-observation data set (groups A,A,B,B)? Give $Z$, $G$, and the implied $\text{Var}(y)$ structure.
With one random intercept per group, $u=(u_A,u_B)^{\top}$ and $Z=\begin{pmatrix}1&0\\1&0\\0&1\\0&1\end{pmatrix}$ (a $1$ in the column of the observation's group). $G=\sigma_u^2 I_2$ and $R=\sigma_e^2 I_4$. Then $ZGZ^{\top}+R$ is **block-diagonal**: a $2\times 2$ block per group with $\sigma_u^2+\sigma_e^2$ on the diagonal and $\sigma_u^2$ off-diagonal, and zeros between groups — exactly the equicorrelation structure.
Estimation (ML/REML)
Contrast **Maximum Likelihood (ML)** and **Restricted/Residual ML (REML)** for estimating variance components.
**ML** maximizes the full likelihood of $y$ over both $\beta$ and the variance components simultaneously. It treats $\beta$ as known when estimating variances, so it does **not** account for the degrees of freedom used to estimate $\beta$ — variance estimates are **biased downward** (too small), especially with few groups. **REML** maximizes the likelihood of **error contrasts** (linear combinations of $y$ with mean $0$, free of $\beta$), so estimating $\beta$ doesn't deflate the variance estimates. REML gives **(approximately) unbiased** variance-component estimates.
Estimation (ML/REML)
Why is REML the default for variance components, but what is its key limitation versus ML?
**Use REML** when the goal is accurate variance components (its estimates are unbiased — analogous to dividing by $n-1$ rather than $n$ for a sample variance). **Limitation:** REML likelihoods are **not comparable across models with different fixed-effects structures**, because they're built from different error contrasts. So you **cannot** use REML log-likelihoods / AIC / likelihood-ratio tests to compare models that differ in their fixed effects $X$ — for that you must refit with **ML**. Use REML to compare/choose *random* structures and to report final variance estimates.
Estimation (ML/REML)
Illustrate the ML downward bias: a balanced one-way model has $m=4$ groups, each with $n=5$ observations. The estimated between-group mean square is $MS_B = 30$ and within-group mean square is $MS_W=10$. Find the REML/ANOVA $\hat\sigma_e^2$ and $\hat\sigma_u^2$.
For a balanced one-way random model the ANOVA (= REML here) estimators are: $\hat\sigma_e^2 = MS_W = 10$. $\hat\sigma_u^2 = \dfrac{MS_B - MS_W}{n} = \dfrac{30-10}{5} = \dfrac{20}{5} = 4$. So between-group variance $\hat\sigma_u^2=4$, within $\hat\sigma_e^2=10$. (ML would shrink $\hat\sigma_u^2$ further toward $0$ by the factor $\tfrac{m-1}{m}=\tfrac34$, giving roughly $3$ — illustrating ML's downward bias with few groups.)
ICC & credibility link
Using $MS_B=30$, $MS_W=10$, $n=5$ from the balanced 4-group model, compute the ICC from the variance components.
From the ANOVA estimators $\hat\sigma_u^2 = \dfrac{MS_B-MS_W}{n}=\dfrac{30-10}{5}=4$ and $\hat\sigma_e^2 = MS_W = 10$. $\rho = \dfrac{\hat\sigma_u^2}{\hat\sigma_u^2+\hat\sigma_e^2} = \dfrac{4}{4+10}=\dfrac{4}{14}\approx 0.286$. About $28.6\%$ of variation is between groups.
BLUP & shrinkage
What is the **BLUP** of a random effect, and how does it differ from the fixed-effect estimate $\hat\beta$?
BLUP = **Best Linear Unbiased Predictor**. We **estimate** fixed effects $\beta$ but **predict** random effects $u$, because $u$ is a random variable, not a fixed parameter. The BLUP $\hat u$ is the best (minimum-MSE) linear unbiased predictor of the realized $u$. "Best" = smallest prediction variance; "unbiased" = $E(\hat u - u)=0$. Unlike $\hat\beta$, the BLUP is **shrunk** toward $0$ (the prior mean of $u$).
BLUP & shrinkage
State the **BLUP of the random intercept** $u_i$ in the model $y_{ij}=\beta_0+u_i+\epsilon_{ij}$, and explain the shrinkage.
$\hat u_i = \dfrac{n_i\,\sigma_u^2}{n_i\,\sigma_u^2 + \sigma_e^2}\,\big(\bar y_i - \hat\beta_0\big)$, where $\bar y_i$ is group $i$'s sample mean and $n_i$ its size. The raw deviation $\bar y_i-\hat\beta_0$ is multiplied by a factor in $(0,1)$, so $\hat u_i$ is **shrunk toward $0$** (the prior mean). The predicted group mean is $\hat\beta_0+\hat u_i$. Larger $n_i$ or larger $\sigma_u^2$ → factor closer to $1$ → less shrinkage (trust the group's own data more).
ICC & credibility link
Show that the BLUP shrinkage factor equals the **Bühlmann credibility factor** $Z=\dfrac{n}{n+k}$, and identify $k$.
Divide the BLUP weight $\dfrac{n_i\sigma_u^2}{n_i\sigma_u^2+\sigma_e^2}$ top and bottom by $\sigma_u^2$: $\dfrac{n_i\sigma_u^2}{n_i\sigma_u^2+\sigma_e^2}=\dfrac{n_i}{n_i + \sigma_e^2/\sigma_u^2}=\dfrac{n_i}{n_i+k}$, with $k=\dfrac{\sigma_e^2}{\sigma_u^2}$. This is exactly Bühlmann's $Z=\dfrac{n}{n+k}$ with $k=\dfrac{\text{EPV}}{\text{VHM}}$, identifying $\sigma_e^2=\text{EPV}$ (within = expected process variance) and $\sigma_u^2=\text{VHM}$ (between = variance of hypothetical means).
ICC & credibility link
Spell out the dictionary between the **linear mixed model** and **Bühlmann credibility**.
Random-effects (LMM) ↔ Credibility: • $\sigma_e^2$ (within / residual) ↔ **EPV** (expected process variance). • $\sigma_u^2$ (between groups) ↔ **VHM** (variance of the hypothetical means). • $k=\sigma_e^2/\sigma_u^2$ ↔ $k=\text{EPV}/\text{VHM}$. • shrinkage factor $\dfrac{n}{n+k}$ ↔ credibility $Z$. • fixed grand mean $\hat\beta_0$ ↔ the collective/manual mean $\mu$. • BLUP group mean $\hat\beta_0+\hat u_i$ ↔ credibility premium $Z\bar y_i+(1-Z)\mu$. A random-intercept LMM **is** the Bühlmann model.
BLUP & shrinkage
Show algebraically that the BLUP group mean $\hat\beta_0+\hat u_i$ equals the credibility-weighted average $Z\bar y_i+(1-Z)\hat\beta_0$.
With $Z=\dfrac{n_i\sigma_u^2}{n_i\sigma_u^2+\sigma_e^2}$ and $\hat u_i = Z(\bar y_i-\hat\beta_0)$: $\hat\beta_0+\hat u_i = \hat\beta_0 + Z(\bar y_i-\hat\beta_0) = Z\bar y_i + (1-Z)\hat\beta_0$. The predicted group mean is a **credibility weighting** of the group's own mean $\bar y_i$ (weight $Z$) and the overall mean $\hat\beta_0$ (weight $1-Z$) — the Bühlmann credibility premium.
BLUP & shrinkage
Worked BLUP: a random-intercept model has $\hat\beta_0 = 200$, $\sigma_u^2 = 900$, $\sigma_e^2 = 3600$. Group A has $n_A = 4$ observations with sample mean $\bar y_A = 260$. Find $\hat u_A$ and the predicted group mean.
Shrinkage / credibility factor: $Z = \dfrac{n_A\sigma_u^2}{n_A\sigma_u^2+\sigma_e^2}=\dfrac{4(900)}{4(900)+3600}=\dfrac{3600}{3600+3600}=\dfrac{3600}{7200}=0.5$. $\hat u_A = Z(\bar y_A-\hat\beta_0)=0.5(260-200)=0.5(60)=30$. Predicted group mean $=\hat\beta_0+\hat u_A = 200+30 = 230$. Equivalently $Z\bar y_A+(1-Z)\hat\beta_0 = 0.5(260)+0.5(200)=230$. The raw $260$ is pulled halfway toward $200$.
BLUP & shrinkage
Using $\sigma_u^2=900$, $\sigma_e^2=3600$ (so $k=4$), tabulate the credibility factor $Z=\dfrac{n}{n+k}$ for groups with $n=1,4,16,36$ observations.
$k=\dfrac{\sigma_e^2}{\sigma_u^2}=\dfrac{3600}{900}=4$. $Z=\dfrac{n}{n+4}$: $n=1:\ \dfrac{1}{5}=0.20$. $n=4:\ \dfrac{4}{8}=0.50$. $n=16:\ \dfrac{16}{20}=0.80$. $n=36:\ \dfrac{36}{40}=0.90$. More observations → more credibility → less shrinkage toward the grand mean.
BLUP & shrinkage
Worked BLUP with little data: $\hat\beta_0=1000$, $\sigma_u^2=200$, $\sigma_e^2=5400$. Group B has $n_B=3$ with $\bar y_B = 1300$. Find the predicted group mean and comment on the shrinkage.
$k=\dfrac{\sigma_e^2}{\sigma_u^2}=\dfrac{5400}{200}=27$. $Z=\dfrac{n_B}{n_B+k}=\dfrac{3}{3+27}=\dfrac{3}{30}=0.10$. $\hat u_B = Z(\bar y_B-\hat\beta_0)=0.10(1300-1000)=0.10(300)=30$. Predicted mean $=1000+30=1030$. With small $n$ and large $k$, only $10\%$ credibility is given to the group's own $\bar y_B=1300$; it is **heavily shrunk** back to the grand mean $1000$.
BLUP & shrinkage
Worked BLUP with much data: $\hat\beta_0=50$, $\sigma_u^2=16$, $\sigma_e^2=64$. Group C has $n_C=24$ with $\bar y_C = 62$. Find the BLUP intercept and predicted mean.
$k=\dfrac{64}{16}=4$. $Z=\dfrac{n_C}{n_C+k}=\dfrac{24}{24+4}=\dfrac{24}{28}\approx 0.857$. $\hat u_C = Z(\bar y_C-\hat\beta_0)=0.857(62-50)=0.857(12)\approx 10.29$. Predicted mean $=50+10.29 = 60.29$. With $24$ observations the group earns $\approx 85.7\%$ credibility, so its estimate stays close to its own $\bar y_C=62$ (little shrinkage).
ICC & credibility link
Predict a **future observation / next-period mean** for a group using credibility. $\mu=\hat\beta_0=500$, $\sigma_u^2=\text{VHM}=1200$, $\sigma_e^2=\text{EPV}=4800$, group has $n=8$ past observations averaging $\bar y=650$.
$k=\dfrac{\text{EPV}}{\text{VHM}}=\dfrac{4800}{1200}=4$, $Z=\dfrac{8}{8+4}=\dfrac{8}{12}=\dfrac{2}{3}\approx 0.667$. Credibility-weighted (BLUP) prediction $= Z\bar y+(1-Z)\mu = \dfrac{2}{3}(650)+\dfrac{1}{3}(500)$ $=433.33+166.67 = 600$. This is both the predicted group mean and the Bühlmann estimate of the group's next observation.
ICC & credibility link
Relate the **ICC** $\rho$ to the credibility factor $Z$ for a group of size $n$.
Since $k=\dfrac{\sigma_e^2}{\sigma_u^2}$ and $\rho=\dfrac{\sigma_u^2}{\sigma_u^2+\sigma_e^2}$, note $k=\dfrac{1-\rho}{\rho}$. Substituting, $Z=\dfrac{n}{n+k}=\dfrac{n}{\,n+\frac{1-\rho}{\rho}\,}=\dfrac{n\rho}{n\rho + (1-\rho)}$. A **higher ICC** (groups differ a lot) ⇒ smaller $k$ ⇒ **higher credibility** $Z$ for the same $n$: when between-group variance dominates, a group's own data is more informative.
ICC & credibility link
Compute $Z$ from an ICC: a model has ICC $\rho = 0.25$. A group has $n=9$ observations. Find the credibility/shrinkage factor two ways.
Via $k$: $k=\dfrac{1-\rho}{\rho}=\dfrac{0.75}{0.25}=3$, so $Z=\dfrac{n}{n+k}=\dfrac{9}{9+3}=\dfrac{9}{12}=0.75$. Via the direct formula: $Z=\dfrac{n\rho}{n\rho+(1-\rho)}=\dfrac{9(0.25)}{9(0.25)+0.75}=\dfrac{2.25}{2.25+0.75}=\dfrac{2.25}{3}=0.75$. ✓ The group earns $75\%$ credibility.
BLUP & shrinkage
Why does the BLUP shrink **more** when a group has **fewer** observations or when **between-group** variance is small?
The factor $Z=\dfrac{n}{n+k}$ with $k=\sigma_e^2/\sigma_u^2$ governs trust in the group's own mean. • **Fewer observations** ($n\downarrow$): the group mean $\bar y_i$ is noisier, so $Z\downarrow$ and we lean on the stable grand mean — more shrinkage. • **Small $\sigma_u^2$** (groups nearly alike): $k\uparrow$, so $Z\downarrow$ — if groups barely differ, a group's deviation is probably noise, so shrink it hard. Conversely large $n$ or large $\sigma_u^2$ → $Z\to 1$ → little shrinkage.
BLUP & shrinkage
Compare three groups under $\sigma_u^2=100$, $\sigma_e^2=400$, grand mean $\hat\beta_0=20$: Group 1 ($n=2$, $\bar y=30$), Group 2 ($n=8$, $\bar y=30$), Group 3 ($n=20$, $\bar y=30$). Find each predicted mean.
$k=\dfrac{400}{100}=4$, and each raw deviation is $30-20=10$. Group 1: $Z=\dfrac{2}{2+4}=\dfrac13\approx0.333$; mean $=20+0.333(10)=23.33$. Group 2: $Z=\dfrac{8}{8+4}=\dfrac23\approx0.667$; mean $=20+0.667(10)=26.67$. Group 3: $Z=\dfrac{20}{20+4}=\dfrac{20}{24}\approx0.833$; mean $=20+0.833(10)=28.33$. Same observed mean, but larger groups are shrunk less and end up closer to $30$.
Estimation (ML/REML)
Estimate the variance components from a small balanced data set. Four groups of $n=3$ have group means $\bar y_i = 18, 22, 26, 30$ (grand mean $\bar y = 24$) and within-group mean square $MS_W = 9$. Find $\hat\sigma_e^2$ and $\hat\sigma_u^2$.
Within: $\hat\sigma_e^2 = MS_W = 9$. Between: $MS_B = \dfrac{n\sum_i(\bar y_i-\bar y)^2}{m-1}$. Deviations $-6,-2,2,6$ ⇒ squares $36,4,4,36$ summing to $80$. With $n=3$, $m=4$: $MS_B=\dfrac{3(80)}{3}=80$. $\hat\sigma_u^2 = \dfrac{MS_B - MS_W}{n}=\dfrac{80-9}{3}=\dfrac{71}{3}\approx 23.67$. So $\hat\sigma_e^2=9$, $\hat\sigma_u^2\approx 23.67$.
ICC & credibility link
Continuing the four-group example ($\hat\sigma_u^2\approx 23.67$, $\hat\sigma_e^2=9$, $n=3$, grand mean $24$), compute the ICC and the BLUP for the group with $\bar y_i = 30$.
ICC: $\rho=\dfrac{23.67}{23.67+9}=\dfrac{23.67}{32.67}\approx 0.724$ — about $72\%$ of variance is between groups. $k=\dfrac{9}{23.67}\approx 0.380$; $Z=\dfrac{n}{n+k}=\dfrac{3}{3+0.380}=\dfrac{3}{3.380}\approx 0.888$. BLUP mean $=24 + 0.888(30-24)=24+0.888(6)=24+5.33=29.33$. High ICC and $n=3$ give strong credibility ($\approx 0.89$), so little shrinkage off $30$.
Estimation (ML/REML)
What can go wrong when an ANOVA/method-of-moments variance-component estimate gives $MS_B < MS_W$, and how is it handled?
Then $\hat\sigma_u^2 = \dfrac{MS_B - MS_W}{n} < 0$ — a **negative variance estimate**, which is impossible for a true variance. It happens by sampling noise when the real $\sigma_u^2$ is near $0$. Standard fix: **truncate at $0$** ($\hat\sigma_u^2 = \max(0,\cdot)$). ML/REML naturally constrain the estimate to be $\ge 0$ (a boundary solution), which is a reason to prefer likelihood-based estimation. A boundary $\hat\sigma_u^2=0$ implies $Z=0$ — no credibility to any group, complete pooling.
Estimation (ML/REML)
How are **fixed effects $\hat\beta$** and **random effects $\hat u$** obtained jointly, and what is the GLS estimator of $\beta$?
They solve **Henderson's mixed-model equations** simultaneously, given the variance components $G,R$. Eliminating $u$ gives the generalized least squares (GLS) estimator with $V=ZGZ^{\top}+R$: $\hat\beta = (X^{\top}V^{-1}X)^{-1}X^{\top}V^{-1}y$. Then the BLUP is $\hat u = GZ^{\top}V^{-1}(y-X\hat\beta)$. In practice $G,R$ are unknown, so we plug in ML/REML estimates, yielding the **empirical** BLUP (EBLUP).
Estimation (ML/REML)
What is the **EBLUP**, and why does it understate prediction uncertainty?
The **EBLUP** (empirical BLUP) is the BLUP with the unknown variance components $G,R$ replaced by their ML/REML estimates. It is what software reports as the predicted random effects. Because it treats $\hat G,\hat R$ as if they were the true values, the naive BLUP prediction-variance formula **ignores the extra uncertainty** from estimating the variance components, so reported standard errors are **too small**. Corrections (e.g. Kackar–Harville) inflate the variance to account for this.
Fixed vs random effects
How do **fixed-effects-only**, **complete-pooling**, and **random-effects** predictions of a group mean compare? Use $\hat\beta_0=40$, $\bar y_i = 70$, $n_i=5$, $\sigma_u^2=50$, $\sigma_e^2=200$.
**No pooling (fixed per group):** predict $\bar y_i = 70$ exactly. **Complete pooling (single mean):** predict the grand mean $40$. **Partial pooling (random effects / BLUP):** $k=\dfrac{200}{50}=4$, $Z=\dfrac{5}{5+4}=\dfrac59\approx 0.556$; predict $40 + 0.556(70-40)=40+16.67=56.67$. The BLUP sits **between** the two extremes — closer to $70$ here because the group has reasonable data ($Z\approx 0.56$).
Fixed vs random effects
Why does treating a high-cardinality categorical predictor (e.g. 500 territories) as a **random** effect outperform fixed dummy variables?
As **fixed** effects, 500 territories cost 499 parameters; small/empty territories get wildly noisy or unidentifiable estimates, and you cannot predict a brand-new territory. As a **random** effect, you estimate just **one** variance $\sigma_u^2$; each territory's BLUP is shrunk toward the mean by its own credibility, sparse territories are stabilized, and a **new** territory is predicted by the overall mean ($\hat u=0$). This is the credibility/regularization benefit of random effects.
Variance components
Decompose total variance and verify the variance-component identity. A random-intercept model has $\sigma_u^2 = 7$, $\sigma_e^2 = 13$. Give $\text{Var}(y_{ij})$, the within-group covariance, the ICC, and the implied $k$.
Total: $\text{Var}(y_{ij}) = \sigma_u^2+\sigma_e^2 = 7+13 = 20$. Within-group covariance (same group, different obs): $\text{Cov}=\sigma_u^2 = 7$. ICC: $\rho=\dfrac{7}{20}=0.35$ (= within-group correlation $=\dfrac{\text{Cov}}{\text{Var}}=\dfrac{7}{20}$). ✓ Credibility constant: $k=\dfrac{\sigma_e^2}{\sigma_u^2}=\dfrac{13}{7}\approx 1.857$.
Fixed vs random effects
State the three nested model structures (complete pooling, no pooling, partial pooling) as formulas, and say which the LMM is.
**Complete pooling:** $y_{ij}=\beta_0+\epsilon_{ij}$ — one mean for everyone ($\sigma_u^2=0$, $Z=0$). **No pooling:** $y_{ij}=\beta_{0,i}+\epsilon_{ij}$ — a free fixed intercept per group ($Z=1$ effectively). **Partial pooling (LMM):** $y_{ij}=\beta_0+u_i+\epsilon_{ij}$, $u_i\sim N(0,\sigma_u^2)$ — group intercepts shrunk toward $\beta_0$ with weight $Z=\dfrac{n_i}{n_i+k}$. The linear mixed model is the **partial-pooling** middle ground; it reduces to complete pooling as $\sigma_u^2\to0$ ($Z\to0$) and to no pooling as $\sigma_u^2\to\infty$ ($Z\to1$).