Willys Flashcards Download
Become an ActuaryExamsFlashcardsExam SRM › Clustering & KNN Practice
Exam SRM · practice

Exam SRM — Clustering & KNN Practice Flashcards

Thirty exam-realistic multiple-choice problems on SOA Exam SRM clustering and nearest-neighbor methods — K-means centroid updates, reassignments and within-cluster sum of squares, hierarchical merges under complete/single/average/centroid linkage, dendrogram heights and cuts, KNN classification votes and regression averages, and the bias-variance role of $K$ plus feature standardization — each with a fully worked solution.

8 free sample30 total · in appFree · fact-checked · LaTeX math
Tap card or press Space to flip
Answer

Unlock the full set

You're studying a free 8-problem sample. All 30 Clustering & KNN practice problems — plus every other Exam SRM subject and spaced-repetition scheduling — are built into the Willys AI Flashcards & Quizzes app. 14-day free trial, then $14.99.

Every deck is built into the Willys app

All of these decks — including the full practice problem banks — come built into Willys AI Flashcards & Quizzes for iPhone & iPad (Mac version coming soon), with FSRS + SM-2 spaced repetition, streaks, and exam-date cram mode. 14-day free trial, then $14.99. To load a deck in the app: Settings → Library → Browse, then pick your exam and deck.

More Exam SRM decks:

Clustering & KNN Decision Trees & Ensembles Decision Trees & Ensembles Practice Generalized Linear Models Generalized Linear Models Practice Linear Regression

← All Exam SRM decks

Browse all 30 problems as a list
  1. K-means
    In K-means clustering, the total within-cluster variation that the algorithm minimizes is most commonly defined as which of the following? (A) $\sum_{k=1}^{K}\sum_{i\in C_k}\lVert x_i-\bar x_k\rVert$ (B) $\sum_{k=1}^{K}\sum_{i\in C_k}\lVert x_i-\bar x_k\rVert^{2}$ (C) $\sum_{k=1}^{K}\max_{i\in C_k}\lVert x_i-\bar x_k\rVert^{2}$ (D) $\sum_{k=1}^{K}\lvert C_k\rvert$ (E) $\sum_{k=1}^{K}\sum_{i\in C_k}\lVert x_i-\bar x_k\rVert^{2}$ minimized over all $K$ simultaneously
    **Answer: (B).** K-means partitions the data into $K$ non-overlapping clusters $C_1,\dots,C_K$ to minimize $\sum_{k=1}^{K} W(C_k)$, where the standard within-cluster variation is the sum of **squared** Euclidean distances of each point to its cluster centroid: $W(C_k)=\sum_{i\in C_k}\lVert x_i-\bar x_k\rVert^{2}$. Choice (A) drops the square (it uses raw distance, not squared). Choice (C) uses only the farthest point, not the full sum. Choice (D) counts cluster sizes, not variation. Choice (E) is wrong because $K$ is **specified in advance**, not optimized jointly — the objective always shrinks to $0$ as $K\to n$, so it cannot be minimized over $K$.
  2. K-means
    In a K-means step the current centroids are $\bar x_A=(1,1)$ and $\bar x_B=(4,3)$. To which cluster is the point $(3,2)$ assigned? (A) Cluster $A$, because its squared distance to $A$ is $5$ (B) Cluster $A$, because $(3,2)$ is closer to the origin (C) Cluster $B$, because its squared distance to $B$ is $2$ (D) Cluster $B$, because its squared distance to $B$ is $5$ (E) Either cluster; the point is equidistant from both centroids
    **Answer: (C).** Assign to the nearest centroid; compare **squared** Euclidean distances (square roots preserve the ordering). To $A$: $(3-1)^{2}+(2-1)^{2}=4+1=5$. To $B$: $(3-4)^{2}+(2-3)^{2}=1+1=2$. Since $2<5$, $(3,2)$ is assigned to cluster $B$, with squared distance $2$. Choice (A) reports the (larger) distance to $A$; choice (D) misstates the distance to $B$ as $5$.
  3. K-means
    A cluster consists of the points $(0,0)$, $(4,0)$, $(0,3)$, and $(4,3)$. Compute its within-cluster sum of squares $W(C_k)=\sum_{i\in C_k}\lVert x_i-\bar x_k\rVert^{2}$. (A) $12.5$ (B) $14.0$ (C) $18.0$ (D) $25.0$ (E) $50.0$
    **Answer: (D).** Centroid: $\bar x=\left(\dfrac{0+4+0+4}{4},\dfrac{0+0+3+3}{4}\right)=(2,1.5)$. Squared distances to $(2,1.5)$: $(0,0)$: $(-2)^{2}+(-1.5)^{2}=4+2.25=6.25$. $(4,0)$: $2^{2}+(-1.5)^{2}=4+2.25=6.25$. $(0,3)$: $(-2)^{2}+1.5^{2}=4+2.25=6.25$. $(4,3)$: $2^{2}+1.5^{2}=4+2.25=6.25$. $W=4(6.25)=25.0$. (Halving each squared deviation, i.e. using the centroid form but forgetting one axis, yields $12.5$.)
  4. K-means
    With $K=2$, the current clusters are $A=\{(1,0),(3,0)\}$ and $B=\{(8,0),(12,0)\}$. After one centroid update, the point $(3,0)$ is reassigned to the cluster with the nearest centroid. Where does it go? (A) Cluster $A$, whose updated centroid is $(2,0)$ (B) Cluster $A$, whose updated centroid is $(4,0)$ (C) Cluster $B$, whose updated centroid is $(10,0)$ (D) Cluster $B$, whose updated centroid is $(8,0)$ (E) It is equidistant and stays put by tie-break
    **Answer: (A).** Update the centroids as cluster means: $\bar x_A=\left(\dfrac{1+3}{2},0\right)=(2,0)$ and $\bar x_B=\left(\dfrac{8+12}{2},0\right)=(10,0)$. Reassign $(3,0)$: squared distance to $A=(3-2)^{2}=1$; to $B=(3-10)^{2}=49$. Since $1<49$, it stays in cluster $A$, whose updated centroid is $(2,0)$. Choice (C) gives the correct $B$ centroid but the wrong assignment.
  5. K-means
    Six points lie on the real line at $1,2,3,8,9,10$. With $K=2$ and initial centroids $\bar x_A=2$ and $\bar x_B=9$, perform one full K-means iteration (assign, then recompute centroids). What are the updated centroids? (A) $\bar x_A=2,\ \bar x_B=9$ (B) $\bar x_A=1.5,\ \bar x_B=9$ (C) $\bar x_A=2,\ \bar x_B=9.5$ (D) $\bar x_A=2,\ \bar x_B=9$ but clusters swap (E) $\bar x_A=3,\ \bar x_B=8$
    **Answer: (A).** Assign each point to the nearer initial centroid ($2$ or $9$). Points $1,2,3$ are closer to $2$; points $8,9,10$ are closer to $9$. So $A=\{1,2,3\}$, $B=\{8,9,10\}$. Recompute means: $\bar x_A=\dfrac{1+2+3}{3}=2$ and $\bar x_B=\dfrac{8+9+10}{3}=9$. The centroids are unchanged at $2$ and $9$, so the algorithm has already converged. (Choices that shift a centroid assume a point like $3$ or $8$ was misassigned to the far cluster.)
  6. Hierarchical clustering
    Three single-point clusters lie on a line at $P=0$, $Q=2$, $R=7$. Under **single linkage**, list the order of merges and the height of the final merge. (A) Merge $P,Q$ at height $2$; final merge at height $5$ (B) Merge $P,Q$ at height $2$; final merge at height $7$ (C) Merge $Q,R$ at height $5$; final merge at height $7$ (D) Merge $P,R$ at height $7$; final merge at height $5$ (E) Merge $P,Q$ at height $2$; final merge at height $4.5$
    **Answer: (A).** Pairwise distances: $d(P,Q)=2$, $d(Q,R)=5$, $d(P,R)=7$. The smallest is $d(P,Q)=2$, so $P$ and $Q$ merge **first at height $2$**. Under single linkage the distance from $R$ to cluster $\{P,Q\}$ is the **minimum** pairwise distance: $\min(d(P,R),d(Q,R))=\min(7,5)=5$. So the final merge occurs at **height $5$**. Complete linkage would instead give $\max(7,5)=7$ (distractor B); average linkage would give $\frac{7+5}{2}=6$.
  7. Linkage & dissimilarity
    Clusters $G=\{a,b\}$ and $H=\{c,d\}$ have cross-cluster distances $d(a,c)=4$, $d(a,d)=6$, $d(b,c)=2$, $d(b,d)=8$. Compute the **average-linkage** dissimilarity between $G$ and $H$. (A) $2$ (B) $5$ (C) $6$ (D) $8$ (E) $20$
    **Answer: (B).** Average linkage is the mean of all $|G|\cdot|H|=2\cdot2=4$ cross-cluster pairwise distances: $d(G,H)=\dfrac{4+6+2+8}{4}=\dfrac{20}{4}=5$. For contrast, single linkage gives $\min=2$ (distractor A), complete linkage gives $\max=8$ (distractor D), and forgetting to divide by $4$ leaves the sum $20$ (distractor E).
  8. Linkage & dissimilarity
    Cluster $G$ has centroid $(1,2)$ and cluster $H$ has centroid $(7,10)$. Compute the **centroid-linkage** dissimilarity between $G$ and $H$. (A) $6$ (B) $8$ (C) $10$ (D) $14$ (E) $100$
    **Answer: (C).** Centroid linkage uses the Euclidean distance between the two cluster centroids: $d(\bar x_G,\bar x_H)=\sqrt{(7-1)^{2}+(10-2)^{2}}=\sqrt{6^{2}+8^{2}}=\sqrt{36+64}=\sqrt{100}=10$. Adding the coordinate gaps without squaring gives $6+8=14$ (distractor D); forgetting the square root leaves $100$ (distractor E).