Exam MAS-I — Statistics & Simulation Flashcards
Statistical inference and simulation for CAS Exam MAS-I: estimator properties (bias, variance, MSE, consistency, efficiency), maximum likelihood and the method of moments with closed-form solutions, Fisher information and asymptotic normality, confidence intervals for means and proportions, hypothesis testing with Type I/II errors, power, p-values, and z/t/chi-square tests, plus the inverse-transform method, Monte Carlo run sizing, and the bootstrap — with fully worked numeric examples.
Import this deck
Download all 44 cards and import them into your flashcard app (JSON or CSV — works with Anki). Using the Willys app? No import needed — this deck is already built in (Settings → Library → Browse).
Every deck is built into the Willys app
All of these decks — including the full practice problem banks — come built into Willys AI Flashcards & Quizzes for iPhone & iPad (Mac version coming soon), with FSRS + SM-2 spaced repetition, streaks, and exam-date cram mode. 14-day free trial, then $14.99. To load a deck in the app: Settings → Library → Browse, then pick your exam and deck.
More Exam MAS-I decks:
Browse all 44 cards as a list
- Point estimationDefine the **bias** of an estimator $\hat\theta$ of a parameter $\theta$, and state what **unbiased** means.Bias is the systematic error: $\text{bias}(\hat\theta)=E[\hat\theta]-\theta$. An estimator is **unbiased** when $\text{bias}(\hat\theta)=0$, i.e. $E[\hat\theta]=\theta$ for every value of $\theta$. Bias measures whether the estimator is centered on the target on average; it says nothing about spread.
- Bias, variance & MSEState the relationship between **mean squared error**, variance, and bias.$\text{MSE}(\hat\theta)=E\!\left[(\hat\theta-\theta)^{2}\right]=\text{Var}(\hat\theta)+\big(\text{bias}(\hat\theta)\big)^{2}$. MSE bundles both error sources: how spread out the estimator is (variance) and how off-center it is (bias squared). For an **unbiased** estimator the bias term vanishes, so $\text{MSE}=\text{Var}(\hat\theta)$.
- Point estimationDefine a **consistent** estimator.$\hat\theta_{n}$ is **consistent** for $\theta$ if it converges in probability to $\theta$ as the sample size grows: for every $\varepsilon>0$, $P\!\left(|\hat\theta_{n}-\theta|>\varepsilon\right)\to 0$ as $n\to\infty$. A convenient sufficient condition: if $\text{bias}(\hat\theta_{n})\to 0$ and $\text{Var}(\hat\theta_{n})\to 0$, then $\text{MSE}\to 0$ and $\hat\theta_{n}$ is consistent (mean-square convergence implies convergence in probability).
- Point estimationWhat does it mean for one unbiased estimator to be more **efficient** than another, and define **relative efficiency**?Among unbiased estimators, the more **efficient** one has the smaller variance. The **relative efficiency** of $\hat\theta_{1}$ to $\hat\theta_{2}$ is $\frac{\text{Var}(\hat\theta_{2})}{\text{Var}(\hat\theta_{1})}$. An estimator whose variance attains the **Cramér–Rao lower bound** $\frac{1}{I(\theta)}$ is called **efficient** (a UMVUE). Smaller variance $\Rightarrow$ tighter, more reliable estimates.
- Point estimationFor an i.i.d. sample, why is the sample mean $\bar X$ unbiased for $\mu$, and what is its variance?$E[\bar X]=\frac{1}{n}\sum E[X_{i}]=\frac{1}{n}(n\mu)=\mu$, so $\bar X$ is **unbiased**. $\text{Var}(\bar X)=\frac{1}{n^{2}}\sum \text{Var}(X_{i})=\frac{n\sigma^{2}}{n^{2}}=\frac{\sigma^{2}}{n}$. The variance $\frac{\sigma^{2}}{n}\to 0$ as $n\to\infty$, so $\bar X$ is also **consistent** for $\mu$.
- Bias, variance & MSEWhy is the divisor $n-1$ used in the sample variance $S^{2}=\frac{1}{n-1}\sum(X_{i}-\bar X)^{2}$?Dividing by $n-1$ makes $S^{2}$ **unbiased**: $E[S^{2}]=\sigma^{2}$. Using $n$ instead gives $\frac{n-1}{n}\sigma^{2}$, which underestimates $\sigma^{2}$ because the deviations are taken about the estimated mean $\bar X$ rather than the true $\mu$. The loss of one degree of freedom (the constraint $\sum(X_{i}-\bar X)=0$) is exactly corrected by the $n-1$ divisor.
- Bias, variance & MSEAn estimator $\hat\theta$ has $E[\hat\theta]=0.9\theta$ and $\text{Var}(\hat\theta)=0.04\theta^{2}$. Find its bias and MSE at $\theta=10$.Bias $=E[\hat\theta]-\theta = 0.9\theta-\theta=-0.1\theta$. At $\theta=10$: bias $=-1$. Variance at $\theta=10$: $0.04(10)^{2}=4$. $\text{MSE}=\text{Var}+\text{bias}^{2}=4+(-1)^{2}=4+1=5$.
- Point estimationTwo unbiased estimators of $\mu$ have variances $\text{Var}(\hat\theta_{1})=\frac{\sigma^{2}}{n}$ and $\text{Var}(\hat\theta_{2})=\frac{2\sigma^{2}}{n+1}$. Which is more efficient for $n=9$, and by how much?Compare variances (drop the common $\sigma^{2}$): $\text{Var}(\hat\theta_{1})=\frac{1}{9}\approx 0.1111\,\sigma^{2}$ and $\text{Var}(\hat\theta_{2})=\frac{2}{10}=0.2\,\sigma^{2}$. Since $0.1111<0.2$, $\hat\theta_{1}$ is more efficient. Relative efficiency of $\hat\theta_{1}$ to $\hat\theta_{2}$ $=\frac{\text{Var}(\hat\theta_{2})}{\text{Var}(\hat\theta_{1})}=\frac{0.2}{0.1111}\approx 1.80$, i.e. $\hat\theta_{2}$ has $1.80$ times the variance of $\hat\theta_{1}$ (equivalently $\hat\theta_{1}$'s variance is about $44\%$ smaller).
- Bias, variance & MSEConsider estimating $\sigma^{2}$ by $T=\frac{1}{n}\sum(X_{i}-\bar X)^{2}$ (divisor $n$). Find its bias for normal data.We know $E\!\left[\sum(X_{i}-\bar X)^{2}\right]=(n-1)\sigma^{2}$. So $E[T]=\frac{(n-1)\sigma^{2}}{n}=\left(1-\tfrac{1}{n}\right)\sigma^{2}$. Bias $=E[T]-\sigma^{2}=-\frac{\sigma^{2}}{n}<0$ (it underestimates). The bias $\to 0$ as $n\to\infty$, so $T$ is **asymptotically unbiased** and consistent.
- Maximum likelihoodWrite the general **likelihood** $L(\theta)$ and **log-likelihood** $\ell(\theta)$ for an i.i.d. sample, and state the MLE recipe.$L(\theta)=\prod_{i=1}^{n} f(x_{i};\theta)$ and $\ell(\theta)=\ln L(\theta)=\sum_{i=1}^{n}\ln f(x_{i};\theta)$. The **MLE** $\hat\theta$ maximizes $L$ (equivalently $\ell$). Solve the score equation $\frac{d}{d\theta}\ell(\theta)=0$ and verify it is a maximum (e.g. $\ell''<0$). Maximizing $\ell$ is easier than $L$ because the product becomes a sum.
- Maximum likelihoodDerive the **MLE of the mean $\theta$** of an exponential distribution $f(x)=\frac{1}{\theta}e^{-x/\theta}$ from an i.i.d. sample.$\ell(\theta)=\sum\left(-\ln\theta-\frac{x_{i}}{\theta}\right)=-n\ln\theta-\frac{1}{\theta}\sum x_{i}$. $\ell'(\theta)=-\frac{n}{\theta}+\frac{1}{\theta^{2}}\sum x_{i}=0 \Rightarrow \frac{\sum x_{i}}{\theta^{2}}=\frac{n}{\theta}\Rightarrow \theta=\frac{\sum x_{i}}{n}$. So $\hat\theta=\bar X$, the sample mean.
- Maximum likelihoodDerive the **MLE of the Poisson rate $\lambda$** from an i.i.d. sample $x_{1},\dots,x_{n}$.$f(x;\lambda)=\frac{e^{-\lambda}\lambda^{x}}{x!}$, so $\ell(\lambda)=\sum\big(-\lambda + x_{i}\ln\lambda - \ln x_{i}!\big)=-n\lambda+\ln\lambda\sum x_{i}-\sum\ln x_{i}!$. $\ell'(\lambda)=-n+\frac{\sum x_{i}}{\lambda}=0\Rightarrow \lambda=\frac{\sum x_{i}}{n}$. So $\hat\lambda=\bar X$.
- Maximum likelihoodDerive the **MLE of the Bernoulli/binomial success probability $p$** from $n$ i.i.d. trials with $\sum x_{i}$ successes.For Bernoulli, $f(x;p)=p^{x}(1-p)^{1-x}$, so $\ell(p)=\left(\sum x_{i}\right)\ln p + \left(n-\sum x_{i}\right)\ln(1-p)$. $\ell'(p)=\frac{\sum x_{i}}{p}-\frac{n-\sum x_{i}}{1-p}=0$. Cross-multiplying: $(1-p)\sum x_{i}=p\,(n-\sum x_{i})\Rightarrow \sum x_{i}=np\Rightarrow \hat p=\frac{\sum x_{i}}{n}=\bar X$.
- Maximum likelihoodA sample of claim counts is $2,0,3,1,4,2$ from a Poisson distribution. Compute the MLE $\hat\lambda$.The Poisson MLE is the sample mean $\hat\lambda=\bar X$. $\sum x_{i}=2+0+3+1+4+2=12$, with $n=6$. $\hat\lambda=\frac{12}{6}=2.0$.
- Maximum likelihoodExponential lifetimes (mean $\theta$) are observed as $4,7,11,2,6$. Compute the MLE $\hat\theta$.The exponential MLE for the mean is $\hat\theta=\bar X$. $\sum x_{i}=4+7+11+2+6=30$, with $n=5$. $\hat\theta=\frac{30}{5}=6.0$.
- Maximum likelihoodDefine the **Fisher information** $I(\theta)$ and state how it gives the asymptotic variance of the MLE.For one observation, $I(\theta)=-E\!\left[\frac{d^{2}}{d\theta^{2}}\ln f(X;\theta)\right]=E\!\left[\left(\frac{d}{d\theta}\ln f(X;\theta)\right)^{2}\right]$. For an i.i.d. sample of size $n$ the total information is $nI(\theta)$, and the MLE is asymptotically normal: $\hat\theta \;\dot\sim\; N\!\left(\theta,\;\frac{1}{nI(\theta)}\right)$. Larger information $\Rightarrow$ smaller asymptotic variance.
- Maximum likelihoodFind the **Fisher information** for the Poisson rate $\lambda$ (per observation) and the asymptotic variance of $\hat\lambda$ for a sample of size $n$.$\ln f = -\lambda + x\ln\lambda - \ln x!$, so $\frac{d}{d\lambda}\ln f = -1+\frac{x}{\lambda}$ and $\frac{d^{2}}{d\lambda^{2}}\ln f = -\frac{x}{\lambda^{2}}$. $I(\lambda)=-E\!\left[-\frac{X}{\lambda^{2}}\right]=\frac{E[X]}{\lambda^{2}}=\frac{\lambda}{\lambda^{2}}=\frac{1}{\lambda}$. So $\text{Var}(\hat\lambda)\approx\frac{1}{nI(\lambda)}=\frac{\lambda}{n}$ — matching $\text{Var}(\bar X)$ exactly since $\hat\lambda=\bar X$.
- Maximum likelihoodState the **method of moments** and contrast it with maximum likelihood.Equate the lowest sample moments to the corresponding population moments and solve for the parameters. For one parameter, set $\bar X = E[X]=g(\theta)$ and solve for $\hat\theta$; for two parameters use $\bar X$ and $\frac{1}{n}\sum X_{i}^{2}$. Method-of-moments estimators are simple and need no optimization, but are generally **less efficient** than MLEs and need not be unbiased. MLEs are typically preferred for their asymptotic efficiency.
- Maximum likelihoodA gamma distribution has mean $\alpha\theta$ and variance $\alpha\theta^{2}$. A sample has $\bar X=10$ and sample second moment $\frac{1}{n}\sum X_{i}^{2}=140$. Find the method-of-moments $\hat\alpha,\hat\theta$.Sample variance (about the mean) $=\frac{1}{n}\sum X_{i}^{2}-\bar X^{2}=140-10^{2}=40$. Match moments: $\alpha\theta=10$ and $\alpha\theta^{2}=40$. Divide: $\frac{\alpha\theta^{2}}{\alpha\theta}=\theta=\frac{40}{10}=4$, so $\hat\theta=4$. Then $\hat\alpha=\frac{10}{\hat\theta}=\frac{10}{4}=2.5$.
- Confidence intervalsState the **large-sample (z) confidence interval** for a population mean $\mu$ when $\sigma$ is known, and for unknown $\sigma$.Known $\sigma$: $\bar x \pm z_{\alpha/2}\,\frac{\sigma}{\sqrt n}$, where $z_{\alpha/2}$ is the standard-normal critical value ($z_{0.025}=1.96$ for $95\%$). Unknown $\sigma$ (use the sample $s$): $\bar x \pm t_{\alpha/2,\,n-1}\,\frac{s}{\sqrt n}$, using the $t$ distribution with $n-1$ degrees of freedom. For large $n$ the $t$ critical value $\to z$.
- Confidence intervalsInterpret a **$95\%$ confidence interval** correctly.It means the *procedure* captures the true parameter $95\%$ of the time: if we repeated the sampling and interval construction many times, about $95\%$ of the resulting intervals would contain the fixed true $\theta$. It does **not** mean there is a $95\%$ probability that $\theta$ lies in this particular computed interval — once computed, the interval either contains $\theta$ or it doesn't. The randomness is in the interval, not in $\theta$.
- Confidence intervalsA sample of $n=64$ losses has $\bar x=520$ and known $\sigma=160$. Build a $95\%$ confidence interval for the mean loss.Standard error $=\frac{\sigma}{\sqrt n}=\frac{160}{\sqrt{64}}=\frac{160}{8}=20$. Margin $=z_{0.025}\cdot 20 = 1.96(20)=39.2$. CI $=520\pm 39.2 = (480.8,\;559.2)$.
- Confidence intervalsA sample of $n=25$ claims has $\bar x=4{,}200$ and sample sd $s=900$. Build a $95\%$ confidence interval for the mean (use $t_{0.025,24}=2.064$).Standard error $=\frac{s}{\sqrt n}=\frac{900}{\sqrt{25}}=\frac{900}{5}=180$. Margin $=t_{0.025,24}\cdot 180 = 2.064(180)=371.52$. CI $=4200\pm 371.52 = (3{,}828.48,\;4{,}571.52)$.
- Confidence intervalsState the large-sample (Wald) **confidence interval for a proportion** $p$.With $\hat p=\frac{x}{n}$, the standard error is $\sqrt{\frac{\hat p(1-\hat p)}{n}}$, and the interval is $\hat p \pm z_{\alpha/2}\sqrt{\frac{\hat p(1-\hat p)}{n}}$. This normal approximation is reliable when both $n\hat p$ and $n(1-\hat p)$ are at least about $5$ to $10$.
- Confidence intervalsOut of $400$ policyholders, $60$ filed a claim. Build a $95\%$ confidence interval for the claim probability $p$.$\hat p=\frac{60}{400}=0.15$. Standard error $=\sqrt{\frac{0.15(0.85)}{400}}=\sqrt{\frac{0.1275}{400}}=\sqrt{0.00031875}\approx 0.017854$. Margin $=1.96(0.017854)\approx 0.0350$. CI $=0.15\pm 0.0350 = (0.1150,\;0.1850)$.
- Confidence intervalsHow many observations $n$ are needed so a $95\%$ confidence interval for a mean has margin of error at most $5$, given $\sigma=40$?Require $z_{0.025}\frac{\sigma}{\sqrt n}\le 5$, i.e. $1.96\cdot\frac{40}{\sqrt n}\le 5$. Solve: $\sqrt n \ge \frac{1.96(40)}{5}=\frac{78.4}{5}=15.68$, so $n\ge 15.68^{2}=245.86$. Round up: $n=246$.
- Hypothesis testingDefine the **null hypothesis** $H_0$, the **alternative** $H_1$, and the two error types in hypothesis testing.$H_0$ is the default claim to be tested; $H_1$ is what we conclude if we reject $H_0$. **Type I error** (probability $\alpha$): rejecting $H_0$ when it is true — a false positive. $\alpha$ is the chosen significance level. **Type II error** (probability $\beta$): failing to reject $H_0$ when $H_1$ is true — a false negative.
- Hypothesis testingDefine the **power** of a test and how it relates to $\beta$.Power $=1-\beta=P(\text{reject }H_0 \mid H_1 \text{ true})$ — the probability of correctly detecting a true effect. Power rises with a larger sample size, a larger true effect (further from $H_0$), a larger significance level $\alpha$, and smaller variance. There is a tradeoff: lowering $\alpha$ to reduce Type I error raises $\beta$ and lowers power, all else equal.
- Hypothesis testingDefine the **$p$-value** and state the decision rule.The $p$-value is the probability, assuming $H_0$ is true, of observing a test statistic at least as extreme as the one actually observed (in the direction of $H_1$). Decision rule: **reject $H_0$ if $p\le\alpha$**; otherwise fail to reject. A small $p$-value means the data would be unlikely under $H_0$, i.e. evidence against it. The $p$-value is *not* the probability that $H_0$ is true.
- Hypothesis testingDistinguish a **one-tailed** from a **two-tailed** test and how the critical value changes.A **two-tailed** test ($H_1:\mu\neq\mu_0$) splits $\alpha$ across both tails, using $z_{\alpha/2}$ ($1.96$ for $\alpha=0.05$). A **one-tailed** test ($H_1:\mu>\mu_0$ or $\mu<\mu_0$) puts all of $\alpha$ in one tail, using $z_{\alpha}$ ($1.645$ for $\alpha=0.05$). The one-tailed test has a less extreme critical value, so it has more power against the specified direction but cannot detect effects in the other direction.
- Hypothesis testingTest $H_0:\mu=100$ vs $H_1:\mu\neq 100$ at $\alpha=0.05$ given $n=36$, $\bar x=106$, known $\sigma=18$. State the conclusion.Test statistic $z=\frac{\bar x-\mu_0}{\sigma/\sqrt n}=\frac{106-100}{18/\sqrt{36}}=\frac{6}{18/6}=\frac{6}{3}=2.0$. Two-tailed critical value is $z_{0.025}=1.96$. Since $|2.0|>1.96$, **reject $H_0$**. The two-sided $p$-value $=2\,P(Z>2.0)=2(0.0228)=0.0455<0.05$, consistent with rejection.
- Hypothesis testingTest $H_0:\mu=50$ vs $H_1:\mu>50$ at $\alpha=0.05$ with $n=16$, $\bar x=53.5$, sample $s=8$ (use $t_{0.05,15}=1.753$).Test statistic $t=\frac{\bar x-\mu_0}{s/\sqrt n}=\frac{53.5-50}{8/\sqrt{16}}=\frac{3.5}{8/4}=\frac{3.5}{2}=1.75$. One-tailed critical value $t_{0.05,15}=1.753$. Since $1.75<1.753$, **fail to reject $H_0$** (just barely). The evidence for $\mu>50$ is not quite significant at the $5\%$ level.
- Hypothesis testingObserved counts in 4 categories are $O=(22,18,20,40)$; the model predicts $E=(25,25,25,25)$. Compute the chi-square goodness-of-fit statistic and decide at $\alpha=0.05$ (critical $\chi^{2}_{0.05,3}=7.815$).Statistic $\chi^{2}=\sum_{j}\frac{(O_{j}-E_{j})^{2}}{E_{j}}$: $\chi^{2}=\frac{(22-25)^{2}}{25}+\frac{(18-25)^{2}}{25}+\frac{(20-25)^{2}}{25}+\frac{(40-25)^{2}}{25}$ $=\frac{9}{25}+\frac{49}{25}+\frac{25}{25}+\frac{225}{25}=\frac{308}{25}=12.32$. Degrees of freedom $=(\text{cells})-1=4-1=3$. Since $12.32>7.815$, **reject $H_0$** — the model fits poorly (the last category is far over-observed).
- Hypothesis testingCompute the **power** of the test $H_0:\mu=0$ vs $H_1:\mu>0$, $\alpha=0.05$, $n=25$, known $\sigma=10$, when the true mean is $\mu=4$.Reject when $\bar X > 0 + z_{0.05}\frac{\sigma}{\sqrt n}=1.645\cdot\frac{10}{5}=3.29$. Under the true $\mu=4$, $\bar X\sim N\!\left(4,\,\left(\tfrac{10}{5}\right)^{2}=4\right)$, so $\text{sd}(\bar X)=2$. Power $=P(\bar X>3.29\mid\mu=4)=P\!\left(Z>\frac{3.29-4}{2}\right)=P(Z>-0.355)\approx 0.639$. So the power is about $64\%$.
- Monte Carlo & bootstrapState the **inverse-transform method** for generating a random variate with cdf $F$.If $U\sim\text{Unif}(0,1)$, then $X=F^{-1}(U)$ has cdf $F$. Procedure: draw $u$ from $\text{Unif}(0,1)$, then solve $F(x)=u$ for $x$. This works because $P(X\le x)=P(F^{-1}(U)\le x)=P(U\le F(x))=F(x)$. It requires an invertible (or at least solvable) cdf and uses exactly one uniform per variate.
- Monte Carlo & bootstrapDerive the **inverse-transform formula for an exponential** with rate $\lambda$ (mean $\frac{1}{\lambda}$).The cdf is $F(x)=1-e^{-\lambda x}$. Set $F(x)=u$: $1-e^{-\lambda x}=u \Rightarrow e^{-\lambda x}=1-u \Rightarrow x=-\frac{1}{\lambda}\ln(1-u)$. So $X=-\frac{1}{\lambda}\ln(1-U)$. Since $1-U$ is also $\text{Unif}(0,1)$, the equivalent simplification $X=-\frac{1}{\lambda}\ln U$ is often used.
- Monte Carlo & bootstrapUsing $X=-\frac{1}{\lambda}\ln(1-U)$ with $\lambda=0.5$ and the uniform draw $U=0.8$, generate an exponential variate.$X=-\frac{1}{0.5}\ln(1-0.8)=-2\ln(0.2)$. $\ln(0.2)\approx -1.609438$, so $X=-2(-1.609438)=3.218876$. $X\approx 3.219$.
- Monte Carlo & bootstrapExplain how to generate a **discrete** random variate (e.g. a claim count) by inversion using a uniform $U$.Build the cumulative probabilities $F(0)\le F(1)\le\cdots$ and partition $[0,1)$ into intervals $\big[F(k-1),F(k)\big)$. Draw $U\sim\text{Unif}(0,1)$ and return the smallest $x$ with $F(x)\ge U$. Equivalently, return $x=k$ when $F(k-1)\le U < F(k)$. Each value's interval width equals its probability, so the output has the target pmf.
- Monte Carlo & bootstrapA discrete loss has $P(X=0)=0.5$, $P(X=1)=0.3$, $P(X=2)=0.2$. Using $U=0.65$ and the inversion rule, what value is generated?Cumulative cutoffs: $F(0)=0.5$, $F(1)=0.8$, $F(2)=1.0$, giving intervals $[0,0.5)\to 0$, $[0.5,0.8)\to 1$, $[0.8,1.0)\to 2$. $U=0.65$ falls in $[0.5,0.8)$, so the generated value is $X=1$.
- Monte Carlo & bootstrapIn **Monte Carlo estimation**, how is a quantity $\mu=E[g(X)]$ estimated and what is the standard error?Simulate $N$ independent draws and average: $\hat\mu=\frac{1}{N}\sum_{i=1}^{N} g(X_{i})$. By the LLN $\hat\mu\to\mu$, and the standard error is $\text{SE}=\frac{s}{\sqrt N}$ where $s$ is the sample sd of the $g(X_{i})$ values. Accuracy improves like $\frac{1}{\sqrt N}$, so cutting the error in half requires $4\times$ the runs.
- Monte Carlo & bootstrapA Monte Carlo simulation of a portfolio loss has per-run standard deviation $s=2{,}000$. How many runs $N$ are needed for a standard error of at most $25$?Require $\frac{s}{\sqrt N}\le 25$, i.e. $\frac{2000}{\sqrt N}\le 25$. Solve: $\sqrt N\ge\frac{2000}{25}=80$, so $N\ge 80^{2}=6400$. Thus at least $N=6{,}400$ runs.
- Monte Carlo & bootstrapDescribe the **bootstrap** and what it estimates.The (nonparametric) bootstrap resamples the observed data **with replacement** to approximate the sampling distribution of a statistic. Draw $B$ resamples each of size $n$ from the original sample, recompute the statistic $\hat\theta^{*}_{b}$ on each, and use the spread of the $\hat\theta^{*}$ values. The **bootstrap standard error** is the sample sd of the $\hat\theta^{*}_{b}$, and a percentile interval uses the empirical quantiles of those replicates. It is useful when an analytic standard error is hard to derive.
- Monte Carlo & bootstrapThree bootstrap resamples of a sample give statistic values $\hat\theta^{*}=12,\,16,\,14$. Estimate the bootstrap standard error.Mean of replicates $=\frac{12+16+14}{3}=\frac{42}{3}=14$. Squared deviations: $(12-14)^{2}=4$, $(16-14)^{2}=4$, $(14-14)^{2}=0$; sum $=8$. Bootstrap SE (using divisor $B-1=2$) $=\sqrt{\frac{8}{2}}=\sqrt{4}=2$.
- Monte Carlo & bootstrapExplain how the bootstrap can estimate the **bias** of an estimator $\hat\theta$.Compute $\hat\theta$ on the original data, then on each bootstrap resample obtain $\hat\theta^{*}_{b}$. The bootstrap bias estimate is $\widehat{\text{bias}}=\bar{\theta}^{*}-\hat\theta$, where $\bar{\theta}^{*}=\frac{1}{B}\sum_{b}\hat\theta^{*}_{b}$. The resampling treats $\hat\theta$ as the "true" parameter and the resamples as new samples, so the average over-/under-shoot mimics the estimator's bias. A bias-corrected estimate is $\hat\theta-\widehat{\text{bias}}=2\hat\theta-\bar\theta^{*}$.