{
  "deckName": "Exam MAS-II — Bayesian Analysis",
  "examCode": "Exam MAS-II",
  "cards": [
    {
      "front": "State **Bayes' theorem** for a parameter $\\theta$ given data $x$, naming each piece.",
      "back": "$\\pi(\\theta\\mid x)=\\frac{L(x\\mid\\theta)\\,\\pi(\\theta)}{\\int L(x\\mid\\theta)\\,\\pi(\\theta)\\,d\\theta}\\propto L(x\\mid\\theta)\\,\\pi(\\theta)$.\n$\\pi(\\theta)$ = **prior** (belief before data); $L(x\\mid\\theta)$ = **likelihood** (model for the data given $\\theta$); the denominator is the **marginal / normalizing constant** $m(x)=\\int L(x\\mid\\theta)\\pi(\\theta)\\,d\\theta$; and $\\pi(\\theta\\mid x)$ is the **posterior**.\nThe denominator does not depend on $\\theta$, so up to proportionality **posterior $\\propto$ likelihood $\\times$ prior**.",
      "tag": "Bayes theorem & posteriors"
    },
    {
      "front": "Why can you usually work with the posterior **only up to proportionality** in $\\theta$?",
      "back": "The marginal $m(x)=\\int L(x\\mid\\theta)\\pi(\\theta)\\,d\\theta$ is a constant in $\\theta$ — it only rescales the curve so it integrates to $1$.\nSo you can drop any factor not involving $\\theta$, recognize the **kernel** of a known density (e.g. a $\\text{Beta}$ or $\\text{Gamma}$ shape), and supply the normalizing constant from that family. This is exactly how conjugate updates are done by inspection without computing the integral.",
      "tag": "Bayes theorem & posteriors"
    },
    {
      "front": "A coin has unknown $\\theta=P(\\text{heads})$. Prior: $\\theta=0.4$ with prob $0.5$ and $\\theta=0.7$ with prob $0.5$ (a discrete prior). You flip once and get **heads**. Find the posterior.",
      "back": "Likelihood of heads is just $\\theta$. Compute joint $=\\pi(\\theta)L$:\n$\\theta=0.4$: $0.5\\times 0.4 = 0.20$.\n$\\theta=0.7$: $0.5\\times 0.7 = 0.35$.\nMarginal $m=0.20+0.35=0.55$.\nPosterior: $\\pi(0.4\\mid H)=\\frac{0.20}{0.55}\\approx 0.3636$, $\\pi(0.7\\mid H)=\\frac{0.35}{0.55}\\approx 0.6364$.\nThe heads outcome shifts weight toward the larger $\\theta$.",
      "tag": "Bayes theorem & posteriors"
    },
    {
      "front": "Two urns: $A$ has $\\frac{1}{3}$ defective, $B$ has $\\frac{1}{6}$ defective; an urn is picked at random (prior $0.5$ each). A drawn item is **defective**. Posterior probability it came from urn $A$?",
      "back": "$P(A\\mid D)=\\frac{P(D\\mid A)P(A)}{P(D\\mid A)P(A)+P(D\\mid B)P(B)}$.\nNumerator $=\\frac{1}{3}\\cdot\\frac{1}{2}=\\frac{1}{6}$. Other term $=\\frac{1}{6}\\cdot\\frac{1}{2}=\\frac{1}{12}$.\n$P(A\\mid D)=\\frac{1/6}{1/6+1/12}=\\frac{1/6}{1/4}=\\frac{2}{3}\\approx 0.667$.\nObserving a defect favors the higher-defect urn $A$.",
      "tag": "Bayes theorem & posteriors"
    },
    {
      "front": "What does it mean for a prior to be **conjugate** to a likelihood, and why is it useful?",
      "back": "A prior family is **conjugate** to a likelihood if the resulting posterior is in the **same family** as the prior — only the parameters change.\nThe update is then closed-form: read off the new parameters and the posterior mean from the family's formulas, with no integration. The conjugate families on the syllabus are beta-binomial, gamma-Poisson, normal-normal (known variance), and gamma-exponential (gamma is conjugate for the rate).",
      "tag": "Conjugate priors"
    },
    {
      "front": "Beta-binomial: with prior $\\theta\\sim\\text{Beta}(\\alpha,\\beta)$ and $x$ successes in $n$ Bernoulli trials, give the posterior and its mean.",
      "back": "Posterior: $\\theta\\mid x \\sim \\text{Beta}(\\alpha+x,\\ \\beta+n-x)$.\nPosterior mean $=\\frac{\\alpha+x}{\\alpha+\\beta+n}$.\nThe data add $x$ to the first ('success') shape and $n-x$ to the second ('failure') shape. The prior acts like $\\alpha$ pseudo-successes and $\\beta$ pseudo-failures already observed.",
      "tag": "Conjugate priors"
    },
    {
      "front": "Prior $\\theta\\sim\\text{Beta}(2,3)$ on a claim probability; in $n=10$ policies you observe $x=4$ claims. Find the posterior and its mean.",
      "back": "Posterior $=\\text{Beta}(\\alpha+x,\\ \\beta+n-x)=\\text{Beta}(2+4,\\ 3+10-4)=\\text{Beta}(6,9)$.\nPosterior mean $=\\frac{6}{6+9}=\\frac{6}{15}=0.40$.\nFor comparison the prior mean was $\\frac{2}{5}=0.40$ and the sample proportion $\\frac{4}{10}=0.40$, so both agree and the posterior mean is $0.40$.",
      "tag": "Conjugate priors"
    },
    {
      "front": "Gamma-Poisson: with prior $\\lambda\\sim\\text{Gamma}(\\alpha,\\beta)$ (rate parameterization, mean $\\alpha/\\beta$) and $n$ Poisson counts $x_1,\\dots,x_n$, give the posterior and its mean.",
      "back": "Posterior: $\\lambda\\mid \\mathbf{x}\\sim\\text{Gamma}\\!\\left(\\alpha+\\textstyle\\sum_{i=1}^{n} x_i,\\ \\beta+n\\right)$.\nPosterior mean $=\\frac{\\alpha+\\sum x_i}{\\beta+n}$.\nWith the rate parameterization the data add the total count $\\sum x_i$ to the shape and the exposure $n$ to the rate parameter $\\beta$.",
      "tag": "Conjugate priors"
    },
    {
      "front": "Prior $\\lambda\\sim\\text{Gamma}(\\alpha=3,\\beta=2)$ (rate form) for a Poisson claim frequency. You observe $n=4$ years with counts $1,0,2,3$. Find the posterior mean.",
      "back": "$\\sum x_i = 1+0+2+3 = 6$, $n=4$.\nPosterior $=\\text{Gamma}(\\alpha+\\sum x_i,\\ \\beta+n)=\\text{Gamma}(3+6,\\ 2+4)=\\text{Gamma}(9,6)$.\nPosterior mean $=\\frac{9}{6}=1.5$.\nPrior mean was $\\frac{3}{2}=1.5$ and sample mean $\\frac{6}{4}=1.5$, so the Bayes estimate is $1.5$.",
      "tag": "Conjugate priors"
    },
    {
      "front": "Normal-normal (known variance $\\sigma^2$): prior $\\mu\\sim N(\\mu_0,\\tau^2)$, data mean $\\bar{x}$ from $n$ obs. Give the posterior.",
      "back": "Posterior is normal: $\\mu\\mid\\mathbf{x}\\sim N(\\mu_n,\\ \\sigma_n^2)$ with **precision** adding:\n$\\frac{1}{\\sigma_n^2}=\\frac{1}{\\tau^2}+\\frac{n}{\\sigma^2}$,\nand $\\mu_n=\\sigma_n^2\\!\\left(\\frac{\\mu_0}{\\tau^2}+\\frac{n\\bar{x}}{\\sigma^2}\\right)$.\nThe posterior mean is a precision-weighted average of the prior mean $\\mu_0$ and the sample mean $\\bar{x}$.",
      "tag": "Conjugate priors"
    },
    {
      "front": "Known $\\sigma^2=100$. Prior $\\mu\\sim N(\\mu_0=50,\\ \\tau^2=25)$. You observe $n=4$ values with mean $\\bar{x}=60$. Find the posterior mean and variance.",
      "back": "Precisions: prior $\\frac{1}{\\tau^2}=\\frac{1}{25}=0.04$; data $\\frac{n}{\\sigma^2}=\\frac{4}{100}=0.04$.\nPosterior precision $=0.04+0.04=0.08$, so $\\sigma_n^2=\\frac{1}{0.08}=12.5$.\n$\\mu_n=12.5\\left(\\frac{50}{25}+\\frac{4(60)}{100}\\right)=12.5(2.0+2.4)=12.5(4.4)=55$.\nPosterior: $N(55,\\ 12.5)$ — halfway between $50$ and $60$ since the two precisions are equal.",
      "tag": "Conjugate priors"
    },
    {
      "front": "Gamma-exponential: prior $\\lambda\\sim\\text{Gamma}(\\alpha,\\beta)$ (rate form) on the rate of an exponential, data $x_1,\\dots,x_n$. Give the posterior.",
      "back": "For $X_i\\sim\\text{Exp}(\\text{rate }\\lambda)$, the likelihood is $\\lambda^{n}e^{-\\lambda\\sum x_i}$.\nPosterior $\\propto \\lambda^{\\alpha+n-1}e^{-(\\beta+\\sum x_i)\\lambda}$, i.e. $\\lambda\\mid\\mathbf{x}\\sim\\text{Gamma}\\!\\left(\\alpha+n,\\ \\beta+\\textstyle\\sum x_i\\right)$.\nPosterior mean of the rate $=\\frac{\\alpha+n}{\\beta+\\sum x_i}$. The data add $n$ to the shape and the total $\\sum x_i$ to the rate parameter.",
      "tag": "Conjugate priors"
    },
    {
      "front": "Prior $\\lambda\\sim\\text{Gamma}(\\alpha=2,\\beta=300)$ (rate form) for an exponential claim-severity rate. You observe $n=3$ claims of sizes $100,200,300$. Find the posterior mean of $\\lambda$.",
      "back": "$\\sum x_i = 100+200+300 = 600$, $n=3$.\nPosterior $=\\text{Gamma}(\\alpha+n,\\ \\beta+\\sum x_i)=\\text{Gamma}(2+3,\\ 300+600)=\\text{Gamma}(5,900)$.\nPosterior mean of the rate $=\\frac{5}{900}\\approx 0.005556$.\nThe plug-in reciprocal $1/E[\\lambda\\mid\\text{data}]=\\frac{900}{5}=180$ is **not** the posterior mean claim size, which is $E[1/\\lambda\\mid\\text{data}]=\\frac{\\beta^*}{\\alpha^*-1}=\\frac{900}{4}=225$.",
      "tag": "Conjugate priors"
    },
    {
      "front": "What is the **prior predictive (marginal) distribution**, and how does it differ from the **posterior predictive**?",
      "back": "**Prior predictive** of a new observation $\\tilde{x}$, before seeing data: $m(\\tilde{x})=\\int f(\\tilde{x}\\mid\\theta)\\,\\pi(\\theta)\\,d\\theta$ — average the model over the prior.\n**Posterior predictive**, after observing $x$: $f(\\tilde{x}\\mid x)=\\int f(\\tilde{x}\\mid\\theta)\\,\\pi(\\theta\\mid x)\\,d\\theta$ — average the model over the **posterior**.\nBoth integrate out $\\theta$; the posterior predictive uses updated beliefs and is what you use to predict the next claim.",
      "tag": "Predictive distribution"
    },
    {
      "front": "Why is the predictive distribution **wider** (more variable) than just plugging the posterior mean of $\\theta$ into the model?",
      "back": "The predictive distribution accounts for **two** sources of uncertainty: the natural variability of the observation given $\\theta$ (process/model variance) **and** the remaining uncertainty about $\\theta$ itself (parameter variance).\nBy the law of total variance, $\\text{Var}(\\tilde{x}\\mid x)=E[\\text{Var}(\\tilde{x}\\mid\\theta)\\mid x]+\\text{Var}(E[\\tilde{x}\\mid\\theta]\\mid x)$. Plugging in a single $\\hat\\theta$ drops the second term and understates uncertainty.",
      "tag": "Predictive distribution"
    },
    {
      "front": "Discrete prior: claim count $X\\sim\\text{Poisson}(\\lambda)$ with $\\lambda=1$ (prob $0.6$) or $\\lambda=3$ (prob $0.4$). Find the **prior predictive** probability of exactly $0$ claims.",
      "back": "$P(X=0\\mid\\lambda)=e^{-\\lambda}$.\nMix over the prior: $P(X=0)=0.6\\,e^{-1}+0.4\\,e^{-3}$.\n$e^{-1}\\approx 0.367879$, $e^{-3}\\approx 0.049787$.\n$P(X=0)=0.6(0.367879)+0.4(0.049787)\\approx 0.220728 + 0.019915 = 0.2406$.",
      "tag": "Predictive distribution"
    },
    {
      "front": "Continuing: with the prior $\\lambda=1$ (prob $0.6$) or $\\lambda=3$ (prob $0.4$), you observe **one year with $0$ claims**. Find the posterior on $\\lambda$, then the **posterior predictive** probability of $0$ claims next year.",
      "back": "Posterior $\\propto$ prior $\\times e^{-\\lambda}$: weights $0.6(0.367879)=0.220728$ and $0.4(0.049787)=0.019915$; sum $=0.240643$.\nPosterior: $P(\\lambda=1\\mid 0)=\\frac{0.220728}{0.240643}\\approx 0.9173$, $P(\\lambda=3\\mid 0)\\approx 0.0827$.\nPosterior predictive of $0$: $0.9173\\,e^{-1}+0.0827\\,e^{-3}\\approx 0.9173(0.367879)+0.0827(0.049787)\\approx 0.3375+0.0041=0.3416$.",
      "tag": "Predictive distribution"
    },
    {
      "front": "Gamma-Poisson posterior predictive: with posterior $\\lambda\\mid x\\sim\\text{Gamma}(\\alpha^*,\\beta^*)$ (rate form), what is the distribution of the next count $\\tilde{x}$?",
      "back": "Mixing a Poisson over a gamma rate gives a **negative binomial**. The next count $\\tilde{x}\\mid x$ has\n$E[\\tilde{x}\\mid x]=\\frac{\\alpha^*}{\\beta^*}$ (the posterior mean of $\\lambda$) and variance $\\frac{\\alpha^*}{\\beta^*}\\!\\left(1+\\frac{1}{\\beta^*}\\right)=\\frac{\\alpha^*}{\\beta^*}\\cdot\\frac{\\beta^*+1}{\\beta^*}$.\nThe variance exceeds the Poisson value $\\frac{\\alpha^*}{\\beta^*}$ because parameter uncertainty in $\\lambda$ adds overdispersion.",
      "tag": "Predictive distribution"
    },
    {
      "front": "With posterior $\\lambda\\mid x\\sim\\text{Gamma}(\\alpha^*=9,\\beta^*=6)$ (rate form), find the mean and variance of the **posterior predictive** count $\\tilde{x}$.",
      "back": "Mean $=\\frac{\\alpha^*}{\\beta^*}=\\frac{9}{6}=1.5$.\nVariance $=\\frac{\\alpha^*}{\\beta^*}\\cdot\\frac{\\beta^*+1}{\\beta^*}=1.5\\cdot\\frac{7}{6}=1.75$.\nThe predictive variance $1.75$ exceeds the mean $1.5$ (overdispersion) because of leftover uncertainty in $\\lambda$, unlike a plain Poisson where variance $=$ mean.",
      "tag": "Predictive distribution"
    },
    {
      "front": "Beta-binomial posterior predictive: posterior $\\theta\\mid x\\sim\\text{Beta}(\\alpha^*,\\beta^*)$. Probability that the **next single trial** is a success?",
      "back": "$P(\\tilde{x}=1\\mid x)=\\int_0^1 \\theta\\,\\pi(\\theta\\mid x)\\,d\\theta = E[\\theta\\mid x]=\\frac{\\alpha^*}{\\alpha^*+\\beta^*}$.\nSo the predictive success probability is simply the **posterior mean** of $\\theta$. (For $m$ future trials the count follows a beta-binomial distribution.)",
      "tag": "Predictive distribution"
    },
    {
      "front": "With posterior $\\theta\\mid x\\sim\\text{Beta}(6,9)$, find the probability the next two independent trials are **both successes**.",
      "back": "For two future trials, integrate $\\theta^2$ against the posterior: $E[\\theta^2\\mid x]=\\frac{\\alpha^*(\\alpha^*+1)}{(\\alpha^*+\\beta^*)(\\alpha^*+\\beta^*+1)}$.\n$=\\frac{6\\cdot 7}{15\\cdot 16}=\\frac{42}{240}=0.175$.\nThis exceeds $(\\text{posterior mean})^2=0.40^2=0.16$ because the two outcomes are positively correlated through the shared unknown $\\theta$.",
      "tag": "Predictive distribution"
    },
    {
      "front": "State the **Bayes estimator** of $\\theta$ under each of squared-error, absolute-error, and $0$-$1$ loss.",
      "back": "**Squared-error loss** $(\\hat\\theta-\\theta)^2$ → posterior **mean** $E[\\theta\\mid x]$.\n**Absolute-error loss** $|\\hat\\theta-\\theta|$ → posterior **median**.\n**$0$-$1$ loss** (penalize any miss equally) → posterior **mode** (the MAP estimate).\nEach is the value minimizing the expected posterior loss for its loss function.",
      "tag": "Loss functions & estimators"
    },
    {
      "front": "Prove the posterior **mean** minimizes expected squared-error loss.",
      "back": "Minimize $g(\\hat\\theta)=E[(\\theta-\\hat\\theta)^2\\mid x]$ over $\\hat\\theta$.\n$g'(\\hat\\theta)=E[-2(\\theta-\\hat\\theta)\\mid x]=-2\\big(E[\\theta\\mid x]-\\hat\\theta\\big)=0$.\nSolving gives $\\hat\\theta=E[\\theta\\mid x]$, the posterior mean; $g''=2>0$ confirms a minimum. The minimized value is the posterior variance $\\text{Var}(\\theta\\mid x)$.",
      "tag": "Loss functions & estimators"
    },
    {
      "front": "Posterior $\\theta\\mid x\\sim\\text{Beta}(8,4)$. Give the Bayes estimate under (a) squared-error and (b) $0$-$1$ loss.",
      "back": "(a) Squared-error → posterior **mean** $=\\frac{\\alpha^*}{\\alpha^*+\\beta^*}=\\frac{8}{12}\\approx 0.667$.\n(b) $0$-$1$ loss → posterior **mode** $=\\frac{\\alpha^*-1}{\\alpha^*+\\beta^*-2}=\\frac{7}{10}=0.70$.\nThe mode exceeds the mean because the $\\text{Beta}(8,4)$ density is left-skewed.",
      "tag": "Loss functions & estimators"
    },
    {
      "front": "Discrete posterior on $\\theta$: $P(\\theta=1)=0.2$, $P(\\theta=2)=0.5$, $P(\\theta=4)=0.3$. Give the Bayes estimate under squared-error, absolute-error, and $0$-$1$ loss.",
      "back": "**Squared-error** → mean $=1(0.2)+2(0.5)+4(0.3)=0.2+1.0+1.2=2.4$.\n**Absolute-error** → median: cumulative weight reaches $0.5$ at $\\theta=2$ (the $0.2+0.5=0.7$ mass at or below $2$ first passes $0.5$), so median $=2$.\n**$0$-$1$ loss** → mode $=2$ (highest mass, $0.5$).",
      "tag": "Loss functions & estimators"
    },
    {
      "front": "How is the **MAP (maximum a posteriori)** estimate related to the MLE, and when do they coincide?",
      "back": "The MAP maximizes the posterior $\\pi(\\theta\\mid x)\\propto L(x\\mid\\theta)\\pi(\\theta)$ — it is the posterior **mode**, the Bayes estimator under $0$-$1$ loss.\nThe MLE maximizes only $L(x\\mid\\theta)$. They coincide when the prior is **flat (constant)** over the relevant range, since then the posterior is proportional to the likelihood. With an informative prior the MAP is pulled toward the prior's high-density region.",
      "tag": "Loss functions & estimators"
    },
    {
      "front": "Under a **weighted/asymmetric** squared-error loss $w(\\theta)(\\hat\\theta-\\theta)^2$, what is the Bayes estimator?",
      "back": "Minimizing $E[w(\\theta)(\\theta-\\hat\\theta)^2\\mid x]$ gives the **weighted posterior mean**\n$\\hat\\theta=\\frac{E[w(\\theta)\\,\\theta\\mid x]}{E[w(\\theta)\\mid x]}$.\nWhen $w\\equiv 1$ this reduces to the ordinary posterior mean. Weighting lets you penalize errors more in regions of $\\theta$ that matter more.",
      "tag": "Loss functions & estimators"
    },
    {
      "front": "With posterior $\\lambda\\mid x\\sim\\text{Gamma}(\\alpha^*=5,\\beta^*=900)$ (rate form), give the Bayes estimate of $\\lambda$ under squared-error loss and its posterior variance.",
      "back": "Squared-error Bayes estimate $=$ posterior mean $=\\frac{\\alpha^*}{\\beta^*}=\\frac{5}{900}\\approx 0.005556$.\nPosterior variance of a $\\text{Gamma}(\\alpha^*,\\beta^*)$ rate $=\\frac{\\alpha^*}{(\\beta^*)^2}=\\frac{5}{900^2}=\\frac{5}{810000}\\approx 6.17\\times 10^{-6}$.\n(Standard deviation $\\approx 0.002484$.)",
      "tag": "Loss functions & estimators"
    },
    {
      "front": "Define a **$95\\%$ credible interval** for $\\theta$ and contrast it with a frequentist confidence interval.",
      "back": "A $95\\%$ **credible interval** $(a,b)$ satisfies $P(a<\\theta<b\\mid x)=0.95$ under the posterior — a direct probability statement about $\\theta$ given the data.\nA frequentist **confidence interval** treats $\\theta$ as fixed: $95\\%$ refers to the long-run coverage of the random interval, not to a probability that this particular interval contains $\\theta$. The Bayesian statement 'there is a $95\\%$ probability $\\theta$ lies in $(a,b)$' is valid; the same wording is incorrect for a confidence interval.",
      "tag": "Credible intervals"
    },
    {
      "front": "Distinguish an **equal-tailed** credible interval from a **highest posterior density (HPD)** interval.",
      "back": "**Equal-tailed:** cut off $2.5\\%$ posterior probability in each tail — bounds are the $0.025$ and $0.975$ posterior quantiles. Easy to compute from quantiles.\n**HPD:** the **shortest** interval containing $95\\%$ posterior probability; every point inside has higher posterior density than any point outside.\nFor a symmetric unimodal posterior the two coincide; for a skewed posterior the HPD is shorter and not equal-tailed.",
      "tag": "Credible intervals"
    },
    {
      "front": "Posterior $\\mu\\mid x\\sim N(55,\\ 12.5)$. Construct a $95\\%$ equal-tailed credible interval for $\\mu$.",
      "back": "Posterior SD $=\\sqrt{12.5}\\approx 3.5355$. Use $z_{0.975}=1.96$.\nInterval $=55\\pm 1.96(3.5355)=55\\pm 6.93$.\nSo the $95\\%$ credible interval is approximately $(48.07,\\ 61.93)$. Because the posterior is symmetric, this is also the HPD interval.",
      "tag": "Credible intervals"
    },
    {
      "front": "Posterior $\\mu\\mid x\\sim N(120,\\ 16)$. Find a **$90\\%$** equal-tailed credible interval and state the probability $\\mu>125$.",
      "back": "Posterior SD $=\\sqrt{16}=4$; $z_{0.95}=1.645$.\n$90\\%$ interval $=120\\pm 1.645(4)=120\\pm 6.58=(113.42,\\ 126.58)$.\n$P(\\mu>125\\mid x)=P\\!\\left(Z>\\frac{125-120}{4}\\right)=P(Z>1.25)\\approx 1-0.8944=0.1056$.",
      "tag": "Credible intervals"
    },
    {
      "front": "From a discrete posterior $P(\\theta=1)=0.10$, $P(\\theta=2)=0.55$, $P(\\theta=3)=0.25$, $P(\\theta=4)=0.10$, give a credible set of probability **at least $0.90$**.",
      "back": "Rank by posterior mass and accumulate until $\\ge 0.90$:\n$\\theta=2$ ($0.55$) → $\\theta=3$ ($+0.25=0.80$) → $\\theta=1$ ($+0.10=0.90$).\nThe set $\\{1,2,3\\}$ carries $0.90$ probability, so it is a $90\\%$ credible set (highest-density style). Equivalently, excluding $\\theta=4$ (mass $0.10$) leaves $0.90$.",
      "tag": "Credible intervals"
    },
    {
      "front": "What is a **noninformative (vague/diffuse) prior**, and what risk does it carry?",
      "back": "A noninformative prior aims to add little or no information, letting the data dominate — e.g. a flat prior $\\pi(\\theta)\\propto 1$ or a very wide proper prior.\nRisk: a flat prior can be **improper** (does not integrate to a finite value); that is acceptable only if the resulting posterior is still proper. A prior that is flat in one parameterization is generally **not** flat after a nonlinear transformation, so 'noninformative' is not invariant — motivating Jeffreys priors.",
      "tag": "Bayes theorem & posteriors"
    },
    {
      "front": "Define the **Jeffreys prior** and give its key property.",
      "back": "Jeffreys prior is $\\pi(\\theta)\\propto\\sqrt{I(\\theta)}$, where $I(\\theta)=-E\\!\\left[\\frac{\\partial^2}{\\partial\\theta^2}\\ln f(X\\mid\\theta)\\right]$ is the Fisher information.\nKey property: it is **invariant under reparameterization** — applying Jeffreys' rule in any one-to-one transformed parameter gives the consistent transformed prior. For a binomial proportion it is $\\pi(\\theta)\\propto\\theta^{-1/2}(1-\\theta)^{-1/2}$, i.e. $\\text{Beta}(\\tfrac12,\\tfrac12)$.",
      "tag": "Bayes theorem & posteriors"
    },
    {
      "front": "Why do we often need **MCMC**, and what does it produce?",
      "back": "For most realistic models the posterior $\\pi(\\theta\\mid x)\\propto L\\pi$ has an **intractable normalizing constant** (the integral $\\int L\\pi\\,d\\theta$ can't be done in closed form), and $\\theta$ may be high-dimensional.\nMCMC builds a Markov chain whose stationary distribution **is** the posterior, generating dependent draws $\\theta^{(1)},\\theta^{(2)},\\dots$. After discarding burn-in, sample averages estimate posterior means, quantiles give credible intervals, etc. — all without ever computing the normalizing constant.",
      "tag": "MCMC & hierarchical"
    },
    {
      "front": "State the **Metropolis-Hastings acceptance ratio** and the accept/reject rule.",
      "back": "From current $\\theta$, propose $\\theta^*$ from $q(\\theta^*\\mid\\theta)$. The acceptance probability is\n$\\alpha=\\min\\!\\left\\{1,\\ \\dfrac{\\pi(\\theta^*\\mid x)\\,q(\\theta\\mid\\theta^*)}{\\pi(\\theta\\mid x)\\,q(\\theta^*\\mid\\theta)}\\right\\}$.\nDraw $u\\sim U(0,1)$: **accept** $\\theta^*$ if $u\\le\\alpha$, else keep $\\theta$. The ratio uses the posterior only through $L\\pi$, so the unknown normalizing constant cancels. For a **symmetric** proposal ($q(\\theta^*\\mid\\theta)=q(\\theta\\mid\\theta^*)$) it simplifies to $\\alpha=\\min\\{1,\\ \\pi(\\theta^*\\mid x)/\\pi(\\theta\\mid x)\\}$ (Metropolis).",
      "tag": "MCMC & hierarchical"
    },
    {
      "front": "Metropolis step: target $\\pi(\\theta\\mid x)\\propto \\theta^{8}(1-\\theta)^{4}$ (a $\\text{Beta}(9,5)$ kernel). Current $\\theta=0.5$, symmetric proposal $\\theta^*=0.7$. Compute the acceptance probability.",
      "back": "Symmetric proposal → $\\alpha=\\min\\{1,\\ r\\}$ with $r=\\frac{\\theta^{*8}(1-\\theta^*)^4}{\\theta^{8}(1-\\theta)^4}$.\nNumerator: $0.7^{8}(0.3)^4 = 0.05764801\\times 0.0081 \\approx 4.6695\\times 10^{-4}$.\nDenominator: $0.5^{8}(0.5)^4 = 0.5^{12}\\approx 2.4414\\times 10^{-4}$.\n$r\\approx \\frac{4.6695\\times 10^{-4}}{2.4414\\times 10^{-4}}\\approx 1.913$.\nSince $r>1$, $\\alpha=\\min\\{1,1.913\\}=1$ — the move to $0.7$ is **always accepted**.",
      "tag": "MCMC & hierarchical"
    },
    {
      "front": "Metropolis step toward a **lower-density** point. Same target $\\pi\\propto\\theta^{8}(1-\\theta)^4$, current $\\theta=0.5$, symmetric proposal $\\theta^*=0.2$. Compute $\\alpha$ and decide whether to accept if $u=0.10$.",
      "back": "Numerator: $0.2^{8}(0.8)^4 = 2.56\\times 10^{-6}\\times 0.4096 \\approx 1.0486\\times 10^{-6}$.\nDenominator: $0.5^{12}\\approx 2.4414\\times 10^{-4}$.\n$r\\approx \\frac{1.0486\\times 10^{-6}}{2.4414\\times 10^{-4}}\\approx 0.004295$, so $\\alpha=\\min\\{1,0.004295\\}\\approx 0.0043$.\nWith $u=0.10>0.0043$, **reject** — the chain stays at $\\theta=0.5$. Downhill moves are accepted only rarely.",
      "tag": "MCMC & hierarchical"
    },
    {
      "front": "Describe **Gibbs sampling** and when it is preferred over Metropolis-Hastings.",
      "back": "Gibbs sampling updates one parameter (or block) at a time by drawing it from its **full conditional** distribution given the current values of all the others: cycle $\\theta_1^{(t+1)}\\sim\\pi(\\theta_1\\mid\\theta_2^{(t)},\\dots,x)$, then $\\theta_2^{(t+1)}\\sim\\pi(\\theta_2\\mid\\theta_1^{(t+1)},\\dots,x)$, and so on.\nEvery draw is **always accepted** (it is a special case of M-H with acceptance $1$). It is preferred when the full conditionals are known standard distributions — common in conjugate hierarchical models.",
      "tag": "MCMC & hierarchical"
    },
    {
      "front": "Define **burn-in**, **thinning**, and how convergence is assessed in MCMC.",
      "back": "**Burn-in:** discard the first portion of the chain, which depends on the arbitrary starting value, before computing summaries.\n**Thinning:** keep every $k$-th draw to reduce autocorrelation/storage (optional; reduces effective info per kept draw).\n**Convergence diagnostics:** trace plots (should look like stationary noise), running-mean stability, autocorrelation plots, and the **Gelman-Rubin** statistic $\\hat{R}$ comparing within- vs between-chain variance across multiple chains ($\\hat{R}\\approx 1$ suggests convergence). MCMC estimates are valid only after the chain has reached its stationary (posterior) distribution.",
      "tag": "MCMC & hierarchical"
    },
    {
      "front": "Estimate a posterior mean and a $95\\%$ credible interval from a thinned, post-burn-in MCMC sample of $\\theta$: $0.30,\\ 0.42,\\ 0.55,\\ 0.61,\\ 0.48,\\ 0.37,\\ 0.52,\\ 0.44$ (treat as the retained draws).",
      "back": "Posterior mean $\\approx$ sample average $=\\frac{0.30+0.42+0.55+0.61+0.48+0.37+0.52+0.44}{8}=\\frac{3.69}{8}\\approx 0.461$.\nFor a credible interval, use the empirical quantiles of the draws: sorted $0.30,0.37,0.42,0.44,0.48,0.52,0.55,0.61$. The crude $95\\%$ interval spans roughly the extreme retained draws $(0.30,\\ 0.61)$ — a real run uses thousands of draws and the $2.5\\%/97.5\\%$ empirical quantiles for a tight interval.",
      "tag": "MCMC & hierarchical"
    },
    {
      "front": "What is a **hierarchical (multilevel) Bayesian model**, and why is it natural for actuarial credibility?",
      "back": "A hierarchical model adds a layer: data depend on group-level parameters $\\theta_i$, and those $\\theta_i$ share a common **hyperprior** with hyperparameters $\\phi$. Schematically $x_{ij}\\mid\\theta_i\\sim f$, $\\theta_i\\mid\\phi\\sim g$, $\\phi\\sim\\pi(\\phi)$.\nThis 'partial pooling' shrinks each group's estimate toward the overall mean by an amount that depends on how much data the group has — exactly the **credibility** idea. Empirical Bayes estimates $\\phi$ from the data; full Bayes puts a prior on $\\phi$ and integrates (typically via MCMC).",
      "tag": "MCMC & hierarchical"
    },
    {
      "front": "In a hierarchical normal model, the group posterior mean is $\\hat\\theta_i = Z_i\\,\\bar{x}_i + (1-Z_i)\\,\\mu$ with $Z_i=\\frac{n_i}{n_i+k}$, $k=\\sigma^2/\\tau^2$. Compute $\\hat\\theta_i$ for $\\bar{x}_i=80$, $\\mu=70$, $n_i=5$, $\\sigma^2=100$, $\\tau^2=25$.",
      "back": "$k=\\frac{\\sigma^2}{\\tau^2}=\\frac{100}{25}=4$, so $Z_i=\\frac{n_i}{n_i+k}=\\frac{5}{5+4}=\\frac{5}{9}\\approx 0.5556$.\n$\\hat\\theta_i = Z_i\\,\\bar{x}_i+(1-Z_i)\\,\\mu = 0.5556(80)+0.4444(70)$\n$\\approx 44.44 + 31.11 = 75.56$.\nThe group estimate is shrunk from its raw mean $80$ partway toward the global mean $70$ — more shrinkage when $n_i$ is small.",
      "tag": "MCMC & hierarchical"
    }
  ]
}