Exam P · exam technique
Exam P — Common Traps Flashcards
A rapid-review cram deck of the classic mistakes candidates make on SOA/CAS Exam P — each card sets up the tempting trap, then states the correct rule and formula plainly.
35 cards10 topicsFree · fact-checked · LaTeX math
Tap card or press Space to flip
Answer
Import this deck
Download all 35 cards and import them into your flashcard app (JSON or CSV — works with Anki). Using the Willys app? No import needed — this deck is already built in (Settings → Library → Browse).
Every deck is built into the Willys app
All of these decks — including the full practice problem banks — come built into Willys AI Flashcards & Quizzes for iPhone & iPad (Mac version coming soon), with FSRS + SM-2 spaced repetition, streaks, and exam-date cram mode. 14-day free trial, then $14.99. To load a deck in the app: Settings → Library → Browse, then pick your exam and deck.
More Exam P decks:
Conditional Probability & Bayes Conditional Probability & Bayes Practice Continuous Distributions Continuous Distributions Practice Covariance, Sums & CLT Covariance, Sums & CLT Practice
Browse all 35 cards as a list
- distributionsA problem says 'a die is rolled until the first six.' You're handed $E[X]=q/p$. Trap?**Trap:** two geometric conventions. 'Number of *trials* until the first success' has support $\{1,2,\dots\}$ and mean $E[X]=\dfrac{1}{p}$. 'Number of *failures* before the first success' has support $\{0,1,\dots\}$ and mean $\dfrac{q}{p}$. Number-of-rolls means $E[X]=\dfrac{1}{p}=6$, not $\dfrac{5/6}{1/6}=5$. Read which one is counted.
- distributionsBoth geometric forms have the *same* variance — so does the convention even matter for $\mathrm{Var}$?**Trap:** since $Y=X-1$ is just a shift, $\mathrm{Var}(Y)=\mathrm{Var}(X)=\dfrac{q}{p^{2}}$ — identical for both conventions. The convention only changes the *mean* (by exactly $1$) and the *pmf support*. So don't second-guess the variance, but always re-check the mean and which $\Pr(X=k)$ the problem wants.
- independenceYou compute $\mathrm{Cov}(X,Y)=0$. Can you now treat $X$ and $Y$ as independent?**Trap:** No. Independence $\Rightarrow$ uncorrelated, but **not** the converse. Zero covariance only rules out a *linear* relationship; $X$ and $Y$ can be strongly dependent nonlinearly (e.g. $Y=X^{2}$ with $X$ symmetric about $0$ gives $\mathrm{Cov}=0$). The lone exception: for a **bivariate normal**, $\rho=0$ *does* imply independence.
- variance-rulesAn exam answer for $\mathrm{Var}(X+Y)$ reads $\mathrm{Var}(X)+\mathrm{Var}(Y)$. When is that wrong?**Trap:** dropping the covariance term. In general $\mathrm{Var}(X+Y)=\mathrm{Var}(X)+\mathrm{Var}(Y)+2\,\mathrm{Cov}(X,Y)$. The cross term vanishes **only** if $X$ and $Y$ are uncorrelated (e.g. independent). If the problem gives a covariance or correlation, it almost certainly wants the $2\,\mathrm{Cov}(X,Y)$ term.
- variance-rules$X-Y$: do you subtract the variances?**Trap:** variances **add**, the covariance sign flips: $\mathrm{Var}(X-Y)=\mathrm{Var}(X)+\mathrm{Var}(Y)-2\,\mathrm{Cov}(X,Y)$. For *independent* $X,Y$ this is $\mathrm{Var}(X)+\mathrm{Var}(Y)$ — never $\mathrm{Var}(X)-\mathrm{Var}(Y)$. The diff of two independent normals $N(\mu_1,\sigma_1^{2})-N(\mu_2,\sigma_2^{2})$ is $N(\mu_1-\mu_2,\ \sigma_1^{2}+\sigma_2^{2})$.
- variance-rulesYou scale a loss by $3$: is $\mathrm{Var}(3X)=3\,\mathrm{Var}(X)$?**Trap:** No — the constant comes out **squared**: $\mathrm{Var}(aX)=a^{2}\,\mathrm{Var}(X)$, so $\mathrm{Var}(3X)=9\,\mathrm{Var}(X)$. Additive shifts drop out entirely: $\mathrm{Var}(aX+b)=a^{2}\,\mathrm{Var}(X)$. For standard deviation, $\mathrm{SD}(aX+b)=|a|\,\sigma$.
- variance-rulesFor the sample mean $\bar X$ of $n$ iid draws, you write $\mathrm{Var}(\bar X)=\sigma^{2}$. Spot the error.**Trap:** forgetting the $1/n$. With $\bar X=\dfrac{1}{n}\sum_{i=1}^{n}X_i$, the $1/n$ comes out squared: $\mathrm{Var}(\bar X)=\dfrac{1}{n^{2}}\cdot n\sigma^{2}=\dfrac{\sigma^{2}}{n}$. The standard error is $\dfrac{\sigma}{\sqrt{n}}$, not $\sigma$. (The *sum* has variance $n\sigma^{2}$; don't confuse sum with mean.)
- clt-normalApproximating a discrete count with the normal: which way does the $\pm 0.5$ go for $\Pr(X\leq k)$ vs $\Pr(X\geq k)$?**Trap:** wrong continuity-correction direction. To *include* the integer $k$, widen the interval toward it: $\Pr(X\leq k)\approx\Pr(Z\leq\frac{k+0.5-\mu}{\sigma})$ and $\Pr(X\geq k)\approx\Pr(Z\geq\frac{k-0.5-\mu}{\sigma})$. For a strict $\Pr(X<k)$ use $k-0.5$; for $\Pr(X>k)$ use $k+0.5$. Only apply this for **integer-valued** variables.
- distributionsA 'given it has already lasted $s$' lifetime problem tempts you to use memorylessness. Always valid?**Trap:** memorylessness holds **only** for the exponential (continuous) and the geometric (discrete). For those, $\Pr(X>s+t\mid X>s)=\Pr(X>t)$. For a Weibull, gamma, Pareto, uniform, etc., you must compute the conditional honestly: $\Pr(X>s+t\mid X>s)=\dfrac{S(s+t)}{S(s)}$.
- loss-modelsUnder deductible $d$, a problem asks for the average payment 'on claims that are paid.' Is that $E[(X-d)_{+}]$?**Trap:** confusing per-loss with per-payment. **Per loss** averages over *all* losses (including the $0$ payments below $d$): $E[(X-d)_{+}]$. **Per payment** conditions on $X>d$: $E[X-d\mid X>d]=\dfrac{E[(X-d)_{+}]}{S(d)}$. 'Paid claims' / 'per payment' is the *larger* conditional figure — divide by $S(d)$.
- expectationGiven $E[X]$, is $E[g(X)]=g(E[X])$ a safe shortcut (e.g. $E[X^{2}]=(E[X])^{2}$)?**Trap:** Jensen's inequality. In general $E[g(X)]\neq g(E[X])$. For a *convex* $g$, $E[g(X)]\geq g(E[X])$ (e.g. $E[X^{2}]\geq(E[X])^{2}$, the gap being $\mathrm{Var}(X)\geq 0$); for *concave* $g$, the inequality reverses. Equality holds only if $g$ is linear or $X$ is degenerate.
- probability-rulesA test has $99\%$ sensitivity and the disease prevalence is $0.1\%$. A positive test — is the patient probably sick?**Trap:** base-rate neglect. A high true-positive rate does **not** mean a high $\Pr(\text{disease}\mid +)$ when the prior is tiny. Use Bayes: $\Pr(D\mid +)=\dfrac{\Pr(+\mid D)\,\Pr(D)}{\Pr(+\mid D)\Pr(D)+\Pr(+\mid D^{c})\Pr(D^{c})}$. With a rare disease the false positives swamp the true ones, so $\Pr(D\mid +)$ can stay small.
- probability-rules$\Pr(A\cup B\cup C)$ — you add the three single probabilities and the three pairwise ones. Done?**Trap:** double-counting in inclusion–exclusion. You must *subtract* the pairwise overlaps and *add back* the triple: $\Pr(A\cup B\cup C)=\sum\Pr(A)-\sum\Pr(A\cap B)+\Pr(A\cap B\cap C)$. Forgetting to subtract overlaps (or omitting the $+\Pr(A\cap B\cap C)$) is the classic Venn-diagram error.
- densitiesA pdf is given as $f(x)=c\,x^{2}$ on $[0,2]$. You start integrating with $c=1$. First step?**Trap:** assuming the density is already normalized. A pdf must integrate to $1$: solve $\int_{0}^{2}c\,x^{2}\,dx=1\Rightarrow c\cdot\frac{8}{3}=1\Rightarrow c=\frac{3}{8}$. **Always find the normalizing constant first** (and confirm $f\geq 0$ on the support) before computing any probability or moment.
- transformations$X\sim U(0,1)$, $Y=X^{2}$. You write $f_Y(y)=f_X(\sqrt{y})$. What's missing?**Trap:** forgetting the Jacobian. For a monotone $Y=g(X)$, $f_Y(y)=f_X\!\big(g^{-1}(y)\big)\left|\dfrac{dx}{dy}\right|$ — the **absolute value** of the derivative of the inverse. Here $x=\sqrt{y}$, $\frac{dx}{dy}=\frac{1}{2\sqrt{y}}$, so $f_Y(y)=1\cdot\frac{1}{2\sqrt{y}}$ on $(0,1)$. Omitting $\left|\frac{dx}{dy}\right|$ is the top transformation error.
- transformations$X_1,\dots,X_n$ are independent $\mathrm{Exp}(\lambda)$. The minimum — is its rate $\lambda/n$?**Trap:** the minimum has the **summed** rate, not the averaged one. $\Pr(\min>x)=\prod e^{-\lambda x}=e^{-n\lambda x}$, so $\min\sim\mathrm{Exp}(n\lambda)$ with mean $\dfrac{1}{n\lambda}$. The **maximum** is *not* exponential at all: its cdf is $F(x)^{n}=(1-e^{-\lambda x})^{n}$, and $E[\max]=\dfrac{1}{\lambda}\sum_{k=1}^{n}\dfrac{1}{k}$.
- estimationAn estimator question: is $\frac{1}{n}\sum (X_i-\bar X)^2$ an unbiased estimator of $\sigma^2$?**Trap:** dividing by $n$ gives the **biased** (MLE) variance estimator. The **unbiased** sample variance divides by $n-1$: $S^{2}=\dfrac{1}{n-1}\sum_{i=1}^{n}(X_i-\bar X)^{2}$, with $E[S^{2}]=\sigma^{2}$. Dividing by $n$ understates $\sigma^{2}$ because $\bar X$ is fitted from the same data (one lost degree of freedom).
- distributions$E[X]=\mu$ for the geometric — but which $\mu$? You see $E[X]=\frac{1-p}{p}$ in a table and $\frac{1}{p}$ in your notes.**Trap:** the same 'geometric' label hides two means. *Trials* form: $E[X]=\dfrac{1}{p}$, support $\{1,2,\dots\}$. *Failures* form: $E[X]=\dfrac{1-p}{p}=\dfrac{q}{p}$, support $\{0,1,\dots\}$. Tables and software differ — anchor to the *support* the problem describes, then pick the matching mean and pmf $q^{x-1}p$ vs $q^{x}p$.
- independence$\mathrm{Cov}(X,Y)=0$ for a bivariate normal pair — independent or not?**Trap (the exception to the exception):** for the **bivariate normal**, $\rho=0\Rightarrow$ independence — this is the one named family where uncorrelated *does* mean independent. But beware: two *marginally* normal variables that aren't *jointly* normal can be uncorrelated yet dependent. The implication needs the full joint normality, not just normal margins.
- variance-rulesLinear combination $aX+bY$ of dependent variables: is $\mathrm{Var}(aX+bY)=a^{2}\mathrm{Var}(X)+b^{2}\mathrm{Var}(Y)$?**Trap:** missing the cross term *and* its $ab$ factor. The full rule is $\mathrm{Var}(aX+bY)=a^{2}\mathrm{Var}(X)+b^{2}\mathrm{Var}(Y)+2ab\,\mathrm{Cov}(X,Y)$. Note the covariance term carries $2ab$ — and if $b$ is negative (a difference), that term is *subtracted*. Drop it only when $\mathrm{Cov}(X,Y)=0$ — independence is sufficient but not necessary, since uncorrelated dependent variables also have zero covariance.
- clt-normalA binomial $\Pr(X\leq 45)$ with $n=100$, $p=0.5$ via the normal. You plug in $45$ directly. Fix it.**Trap:** no continuity correction on a discrete count. Here $\mu=50$, $\sigma=\sqrt{100(0.5)(0.5)}=5$. To include $45$, use $45+0.5$: $\Pr(X\leq 45)\approx\Pr\!\big(Z\leq\frac{45.5-50}{5}\big)=\Pr(Z\leq-0.9)\approx 0.1841$. Using $45$ gives the wrong tail.
- distributionsMemoryless 'expected additional wait': a bus is exponential with mean $10$, you've waited $7$. Expected remaining wait?**Trap:** subtracting elapsed time. For an **exponential**, by memorylessness the expected *remaining* wait is still the full mean $E[X-7\mid X>7]=10$, **not** $10-7=3$. Geometric behaves the same in discrete time. For any *non-memoryless* lifetime this shortcut is invalid.
- loss-modelsPer-payment expected cost: you compute $E[(X-d)_{+}]$ and report it as 'average payment per claim paid.' Right?**Trap:** you skipped dividing by $S(d)$. Per-loss $E[(X-d)_{+}]$ already averages in the zeros from losses below $d$. To get **per payment**, condition on a payment: $\dfrac{E[(X-d)_{+}]}{S(d)}$. For exponential losses this conditional mean is just $\theta$ (memorylessness), regardless of $d$ — a fast sanity check.
- expectationYou need $E[1/X]$ and reach for $1/E[X]$. Safe?**Trap:** Jensen again — $g(x)=1/x$ is convex on $(0,\infty)$, so $E\!\big[\tfrac{1}{X}\big]\geq\dfrac{1}{E[X]}$, with strict inequality unless $X$ is constant. You must integrate $\int \tfrac{1}{x}f(x)\,dx$. Same warning for $E[\sqrt{X}]\neq\sqrt{E[X]}$ (concave $\Rightarrow E[\sqrt X]\leq\sqrt{E[X]}$).
- probability-rulesBayes: $\Pr(B\mid A)$ — can you just reuse $\Pr(A\mid B)$ as if it were the same number?**Trap:** confusing the conditioning direction (the 'prosecutor's fallacy'). $\Pr(A\mid B)\neq\Pr(B\mid A)$ in general; they're linked by $\Pr(B\mid A)=\dfrac{\Pr(A\mid B)\,\Pr(B)}{\Pr(A)}$. You must weight by the prior $\Pr(B)$ and normalize by $\Pr(A)$ (often via total probability). Swapping them ignores the base rate.
- probability-rules$60\%$ speak French, $50\%$ Spanish. What's $\Pr(\text{at least one})$ — is it $0.6+0.5=1.1$?**Trap:** a probability above $1$ flags double counting. $\Pr(F\cup S)=\Pr(F)+\Pr(S)-\Pr(F\cap S)$ — you must subtract the bilingual overlap. The fact that $0.6+0.5>1$ *forces* $\Pr(F\cap S)\geq 0.1$. Use the Bonferroni-style sanity check: $\Pr(\text{union})\leq 1$ always.
- densitiesA density $f(x)=k(1-x)$ on $[0,1]$, and you're asked for $\Pr(X>0.5)$. You integrate with $k=1$. Problem?**Trap:** any probability you compute is wrong until $f$ is normalized. Solve $\int_{0}^{1}k(1-x)\,dx=k\cdot\tfrac{1}{2}=1\Rightarrow k=2$. Then $\Pr(X>0.5)=\int_{0.5}^{1}2(1-x)\,dx=0.25$. Skipping the constant scales every answer by the wrong factor (here, by $2$).
- transformationsBivariate transform $(U,V)=g(X,Y)$: you multiply by the Jacobian of $g$ itself. Correct factor?**Trap:** wrong Jacobian *and* sign. You need the absolute value of the determinant of the **inverse** map $(x,y)$ in terms of $(u,v)$: $f_{U,V}(u,v)=f_{X,Y}(x,y)\,|J|$, where $J=\dfrac{\partial x}{\partial u}\dfrac{\partial y}{\partial v}-\dfrac{\partial x}{\partial v}\dfrac{\partial y}{\partial u}$. Take $|J|$, and re-express the support in the new variables.
- transformationsTwo independent $\mathrm{Exp}(\lambda)$ machines; the system fails when **both** fail. Is the time-to-fail exponential?**Trap:** 'both fail' is the **maximum**, which is *not* exponential and *not* memoryless. Its cdf is $(1-e^{-\lambda t})^{2}$ and $E[\max]=\dfrac{1}{\lambda}\big(1+\tfrac{1}{2}\big)=\dfrac{3}{2\lambda}$. Only 'system fails when the **first** fails' (a series / minimum) gives an exponential, $\mathrm{Exp}(2\lambda)$.
- estimationYou report the population variance formula $\frac{1}{n}\sum(X_i-\bar X)^2$ as the standard 'sample variance.' Bias?**Trap:** that divisor understates $\sigma^{2}$: $E\!\big[\tfrac{1}{n}\sum(X_i-\bar X)^{2}\big]=\dfrac{n-1}{n}\sigma^{2}<\sigma^{2}$. Multiply by $\dfrac{n}{n-1}$ (Bessel's correction) to debias, giving $S^{2}=\dfrac{1}{n-1}\sum(X_i-\bar X)^{2}$. Use $n$ only when the true mean $\mu$ is *known* (not estimated by $\bar X$).
- variance-rulesStandardizing a sum $S_n=\sum X_i$ for the CLT: do you divide by $\sigma$ or $\sigma\sqrt{n}$?**Trap:** using the wrong scale. The **sum** has SD $\sigma\sqrt{n}$, so $\dfrac{S_n-n\mu}{\sigma\sqrt{n}}\approx N(0,1)$. The **mean** has SD $\sigma/\sqrt{n}$, so $\dfrac{\bar X-\mu}{\sigma/\sqrt{n}}\approx N(0,1)$. Mixing up sum-scale and mean-scale (e.g. dividing the sum by $\sigma/\sqrt n$) is a frequent CLT slip.
- densities$\Pr(X=k)$ for a continuous random variable — you compute a positive number. Possible?**Trap:** for any *continuous* variable, $\Pr(X=k)=0$ — point masses contribute nothing. Hence $\Pr(X\leq k)=\Pr(X<k)$ and $\leq$ vs $<$ doesn't matter. This *fails* for **mixed** distributions (e.g. a censored loss $X\wedge u$ has an atom $\Pr(X\geq u)$ at $u$), where the boundary mass is real.
- variance-rules$\mathrm{Var}(X+Y+Z)$ for three *pairwise dependent* variables — just sum the three variances?**Trap:** you owe **three** covariance terms, one per pair: $\mathrm{Var}(X+Y+Z)=\mathrm{Var}(X)+\mathrm{Var}(Y)+\mathrm{Var}(Z)+2[\mathrm{Cov}(X,Y)+\mathrm{Cov}(X,Z)+\mathrm{Cov}(Y,Z)]$. In general $\mathrm{Var}\!\big(\sum_i X_i\big)=\sum_i\mathrm{Var}(X_i)+2\sum_{i<j}\mathrm{Cov}(X_i,X_j)$.
- distributionsA 'memoryless' wording trap: a *used* part's lifetime is uniform on $[0,10]$, $5$ years elapsed. Expected remaining life $=5$?**Trap:** the uniform is **not** memoryless. Given survival to $5$, the remaining life is uniform on $[0,5]$, so the expected residual is $\dfrac{5}{2}=2.5$, not $5$. Only exponential (and discrete geometric) lifetimes keep a constant expected residual; everything else needs $E[X-s\mid X>s]$ computed directly.
- transformationsOrder-statistic median of $n=4$ values — you reach for the average of the two middle order statistics' *expectations* as the median's distribution. Trap?**Trap:** the $k$-th order statistic has its **own** density, not a shortcut from the mean. For iid continuous $X_i$ with cdf $F$, pdf $f$: $f_{X_{(k)}}(x)=\dfrac{n!}{(k-1)!(n-k)!}F(x)^{k-1}[1-F(x)]^{n-k}f(x)$. The max is the $k=n$ case, the min the $k=1$ case — don't conflate $E[\text{stat}]$ with the stat itself.