Exam SRM — Decision Trees & Ensembles Practice Flashcards

Thirty exam-realistic multiple-choice problems on SOA Exam SRM tree-based methods — regression-tree region means and RSS splits, Gini, cross-entropy, and classification-error impurity and impurity reduction, cost-complexity (weakest-link) pruning, bagging with out-of-bag error, random forests with $m\approx\sqrt{p}$ decorrelation, and boosting tuning ($\lambda$, depth $d$, number of trees) — each with a fully worked solution.

8 free sample30 total · in appFree · fact-checked · LaTeX math

Tap card or press Space to flip

Answer

Unlock the full set

You're studying a free 8-problem sample. All 30 Decision Trees & Ensembles practice problems — plus every other Exam SRM subject and spaced-repetition scheduling — are built into the Willys AI Flashcards & Quizzes app. 14-day free trial, then $14.99.

Get the app — free 14-day trial

Every deck is built into the Willys app

All of these decks — including the full practice problem banks — come built into Willys AI Flashcards & Quizzes for iPhone & iPad (Mac version coming soon), with FSRS + SM-2 spaced repetition, streaks, and exam-date cram mode. 14-day free trial, then $14.99. To load a deck in the app: Settings → Library → Browse, then pick your exam and deck.

Download on the App Store

More Exam SRM decks:

Clustering & KNN Clustering & KNN Practice Decision Trees & Ensembles Generalized Linear Models Generalized Linear Models Practice Linear Regression

← All Exam SRM decks

Browse all 30 problems as a list

Regression trees
A regression-tree node contains the responses $\{6,10,11,13,20\}$. The leaf predicts the mean response. Calculate the residual sum of squares (RSS) contributed by this leaf. (A) $0$ (B) $14.0$ (C) $21.2$ (D) $106.0$ (E) $132.5$
**Answer: (D).** The leaf prediction is the mean of its training responses: $\hat y_{R_m}=\dfrac{6+10+11+13+20}{5}=\dfrac{60}{5}=12$. RSS $=\sum_{i\in R_m}(y_i-\hat y_{R_m})^2=(6-12)^2+(10-12)^2+(11-12)^2+(13-12)^2+(20-12)^2$. $=36+4+1+1+64=106.0$. Distractor (C) $21.2$ divides the RSS by $5$ (the variance, not the sum); (B) $14.0$ uses $n-1$ in a sample-variance shortcut. The leaf's prediction is the mean, so the correct figure is the raw sum of squared deviations, $106.0$.
Regression trees
A regression node holds the responses $\{5,8,9,14\}$. A candidate split sends $\{5,8\}$ to the left child and $\{9,14\}$ to the right child, each leaf predicting its own mean. Calculate the total RSS of this split. (A) $4.5$ (B) $8.67$ (C) $17.0$ (D) $21.0$ (E) $42.0$
**Answer: (C).** Each child predicts its mean. Left $\{5,8\}$: mean $=\dfrac{5+8}{2}=6.5$, RSS $=(5-6.5)^2+(8-6.5)^2=2.25+2.25=4.5$. Right $\{9,14\}$: mean $=\dfrac{9+14}{2}=11.5$, RSS $=(9-11.5)^2+(14-11.5)^2=6.25+6.25=12.5$. Total split RSS $=4.5+12.5=17.0$. The parent (no split) has mean $9$ and RSS $=(5-9)^2+(8-9)^2+(9-9)^2+(14-9)^2=16+1+0+25=42.0$ (distractor E), so the split cuts RSS from $42.0$ to $17.0$.
Regression trees
A regression node holds $\{5,8,9,14\}$. Three splits are considered: A $=\{5,8\}\mid\{9,14\}$, B $=\{5,8,9\}\mid\{14\}$, C $=\{5\}\mid\{8,9,14\}$. Greedy recursive binary splitting picks the split with the smallest total RSS. Which split is chosen? (A) Split A, total RSS $=17.0$ (B) Split B, total RSS $=8.67$ (C) Split C, total RSS $=20.67$ (D) Split B, total RSS $=12.67$ (E) All three tie at RSS $=14.0$
**Answer: (B).** Each leaf predicts its mean; compute each split's total RSS. **A** $\{5,8\}\mid\{9,14\}$: means $6.5,11.5$; RSS $=4.5+12.5=17.0$. **B** $\{5,8,9\}\mid\{14\}$: left mean $=\dfrac{22}{3}\approx7.333$, RSS $=(5-7.333)^2+(8-7.333)^2+(9-7.333)^2\approx5.444+0.444+2.778=8.667$; right is a single point, RSS $=0$. Total $\approx8.67$. **C** $\{5\}\mid\{8,9,14\}$: left RSS $=0$; right mean $=\dfrac{31}{3}\approx10.333$, RSS $\approx5.444+1.778+13.444=20.67$. The smallest is Split B at $8.67$, so the greedy algorithm chooses **Split B**. (Choice A is the lowest-numbered split, a common reflex; choice C confuses B's value with C's.)
Regression trees
A regression tree splits a node of $4$ observations into two leaves with RSS values $4.5$ and $12.5$. The unsplit parent node has RSS $42.0$. Calculate the reduction in RSS achieved by this split. (A) $8.0$ (B) $17.0$ (C) $25.0$ (D) $29.5$ (E) $42.0$
**Answer: (C).** The split's total RSS is the sum over the two child leaves: $4.5+12.5=17.0$. Reduction $=\text{RSS}_{\text{parent}}-\text{RSS}_{\text{split}}=42.0-17.0=25.0$. Greedy recursive binary splitting chooses, at each node, the predictor/cutpoint pair that maximizes exactly this reduction. Distractor (B) $17.0$ reports the post-split RSS rather than the reduction; (D) subtracts only one child's RSS ($42.0-12.5=29.5$).
Impurity measures
A two-class classification node contains $70$ observations of class A and $30$ of class B. Calculate the Gini index $G=\sum_k \hat p_k(1-\hat p_k)$. (A) $0.30$ (B) $0.42$ (C) $0.50$ (D) $0.58$ (E) $0.70$
**Answer: (B).** Proportions: $\hat p_A=\dfrac{70}{100}=0.7$, $\hat p_B=\dfrac{30}{100}=0.3$. $G=\hat p_A(1-\hat p_A)+\hat p_B(1-\hat p_B)=0.7(0.3)+0.3(0.7)=0.21+0.21=0.42$. Equivalently $G=2\hat p(1-\hat p)=2(0.7)(0.3)=0.42$, or $G=1-\sum_k\hat p_k^2=1-(0.49+0.09)=0.42$. Distractor (A) $0.30$ is the classification error $1-\max_k\hat p_k$, not the Gini; (E) $0.70$ is the majority proportion itself.
Impurity measures
A two-class node has class-1 proportion $\hat p=0.7$. Calculate the cross-entropy $D=-\sum_k \hat p_k\ln\hat p_k$. (A) $0.300$ (B) $0.420$ (C) $0.611$ (D) $0.881$ (E) $0.915$
**Answer: (C).** With proportions $0.7$ and $0.3$: $D=-[0.7\ln 0.7+0.3\ln 0.3]$. Using $\ln 0.7\approx-0.356675$ and $\ln 0.3\approx-1.203973$: $D=-[0.7(-0.356675)+0.3(-1.203973)]=-[-0.249672-0.361192]=0.610864\approx0.611$. Distractor (B) $0.420$ is the Gini index $2(0.7)(0.3)$; (A) $0.300$ is the classification error. Cross-entropy uses natural logs of each proportion, weighted by that proportion.
Impurity measures
A three-class node has counts $A=15$, $B=25$, $C=10$ (total $50$). Calculate the classification error rate $E=1-\max_k\hat p_k$, the Gini index $G$, and the cross-entropy $D$, and identify which row below is correct. (A) $E=0.50$, $G=0.62$, $D=1.030$ (B) $E=0.50$, $G=0.38$, $D=0.673$ (C) $E=0.30$, $G=0.62$, $D=1.030$ (D) $E=0.50$, $G=0.62$, $D=0.673$ (E) $E=0.70$, $G=0.38$, $D=1.030$
**Answer: (A).** Proportions: $\hat p_A=0.3$, $\hat p_B=0.5$, $\hat p_C=0.2$. **Classification error:** $E=1-\max_k\hat p_k=1-0.5=0.50$. **Gini:** $G=1-\sum_k\hat p_k^2=1-(0.09+0.25+0.04)=1-0.38=0.62$. **Cross-entropy:** $D=-[0.3\ln0.3+0.5\ln0.5+0.2\ln0.2]=-[0.3(-1.203973)+0.5(-0.693147)+0.2(-1.609438)]=-[-0.361192-0.346574-0.321888]=1.029654\approx1.030$. Distractor (B) reports $G=1-0.62=0.38$ (the sum of squares, not its complement) and $D$ for a two-class shortcut; (C) misreads $E$ as $1-0.7$.
Impurity measures
A parent classification node of $50$ observations has class-1 proportion $\hat p=0.6$. A split produces a left child of $20$ observations with $\hat p=1.0$ (pure) and a right child of $30$ observations with $\hat p=\tfrac{1}{3}$. Calculate the reduction in the Gini index from this split (parent Gini minus weighted child Gini). (A) $0.107$ (B) $0.160$ (C) $0.213$ (D) $0.267$ (E) $0.480$
**Answer: (C).** Parent Gini $=2(0.6)(0.4)=0.48$. Left child (pure): $G=2(1.0)(0)=0$. Right child: $G=2\left(\tfrac13\right)\left(\tfrac23\right)=\dfrac{4}{9}\approx0.4444$. Weighted child Gini $=\dfrac{20}{50}(0)+\dfrac{30}{50}(0.4444)=0+0.6(0.4444)=0.2667$. Reduction $=0.48-0.2667=0.2133\approx0.213$. Distractor (D) $0.267$ is the weighted child Gini itself, not the reduction; (E) $0.48$ is the parent Gini. The split's value is the *drop* in impurity.