{
  "deckName": "Exam SRM — Principal Components Analysis",
  "examCode": "Exam SRM",
  "cards": [
    {
      "front": "What problem does **principal components analysis (PCA)** solve, and is it supervised or unsupervised?",
      "back": "PCA is an **unsupervised** technique: there is no response $Y$, only the $p$ features. Its goal is **dimension reduction** — summarizing a large set of correlated variables by a small number of new variables (the principal components) that capture as much of the data's variation as possible.\nThe components are uncorrelated directions, ordered so the first carries the most variance. They are used for visualization, to remove multicollinearity, and as derived predictors (PCR).",
      "tag": "Principal components"
    },
    {
      "front": "Write the **first principal component** $Z_1$ as a linear combination of the features and state the constraint on its coefficients.",
      "back": "$Z_1=\\phi_{11}X_1+\\phi_{21}X_2+\\dots+\\phi_{p1}X_p=\\sum_{j=1}^{p}\\phi_{j1}X_j$.\nThe coefficients $\\phi_{j1}$ are the **loadings**, collected in the loading vector $\\phi_1=(\\phi_{11},\\dots,\\phi_{p1})^{\\top}$.\nThey are **normalized** so that $\\sum_{j=1}^{p}\\phi_{j1}^{2}=1$. Without this constraint the variance could be inflated arbitrarily by scaling up the loadings.",
      "tag": "Principal components"
    },
    {
      "front": "In words, what defines the **first principal component direction**?",
      "back": "Among all normalized linear combinations of the (centered) features, $Z_1$ is the one with the **largest sample variance** — it is the direction in feature space along which the data vary most.\nEquivalently, it is the line closest to the data in the sense that it minimizes the sum of squared perpendicular distances from the points to the line.",
      "tag": "Principal components"
    },
    {
      "front": "How is the **second** principal component defined relative to the first?",
      "back": "$Z_2=\\sum_{j=1}^{p}\\phi_{j2}X_j$ is the normalized linear combination of maximal variance **subject to being uncorrelated with $Z_1$**.\nUncorrelatedness of the scores is equivalent to the loading vector $\\phi_2$ being **orthogonal** to $\\phi_1$ ($\\phi_1^{\\top}\\phi_2=0$). Each later component is the max-variance direction orthogonal to all earlier ones.",
      "tag": "Principal components"
    },
    {
      "front": "Why must the features be **centered** (mean-subtracted) before computing principal components?",
      "back": "PCA finds directions of maximal **variance**, which is measured about the mean. If the data are not centered, the leading direction would be pulled toward the overall mean vector rather than reflecting the spread of the data.\nSo each column is replaced by $X_j-\\bar X_j$ before the components are formed; the components then pass through the centroid of the cloud.",
      "tag": "Principal components"
    },
    {
      "front": "Define the **loadings** and the **scores** in PCA and how they relate to the data matrix.",
      "back": "**Loadings** $\\phi_{jm}$ are the weights defining each component; the loading vector $\\phi_m$ is the **eigenvector** of the covariance (or correlation) matrix for the $m$-th largest eigenvalue. Loadings tell you which variables a component represents.\n**Scores** $z_{im}$ are the values of the component for each observation: $z_{im}=\\sum_{j=1}^{p}\\phi_{jm}\\,x_{ij}$ (on centered data). Scores are the **projections** of the observations onto the loading direction.",
      "tag": "Loadings & scores"
    },
    {
      "front": "How are the principal-component loadings obtained from the **covariance (or correlation) matrix**?",
      "back": "The loading vectors are the **eigenvectors** of the sample covariance matrix $\\Sigma$ (or the correlation matrix when the data are standardized). The eigenvector for the largest eigenvalue is $\\phi_1$, for the second-largest is $\\phi_2$, and so on.\nThe corresponding **eigenvalue** $\\lambda_m$ equals the variance of the $m$-th score $Z_m$. The eigenvectors are orthonormal, matching the normalization and orthogonality of the components.",
      "tag": "Loadings & scores"
    },
    {
      "front": "The first loading vector for two standardized variables is $\\phi_1=(0.6,0.8)^{\\top}$. Verify it is normalized and compute the **score** of an observation with standardized values $x_1=1.5$, $x_2=-0.5$.",
      "back": "Normalization check: $0.6^{2}+0.8^{2}=0.36+0.64=1$. Good — it is a unit vector.\nScore: $z_1=\\phi_{11}x_1+\\phi_{21}x_2=0.6(1.5)+0.8(-0.5)=0.90-0.40=0.50$.\nSo this observation projects to $0.50$ along the first principal component.",
      "tag": "Loadings & scores"
    },
    {
      "front": "An observation has centered values $(x_1,x_2,x_3)=(2,-1,3)$ and the first loading vector is $\\phi_1=(0.5,0.5,0.707107)^{\\top}$. Compute its **first-component score**.",
      "back": "$z_1=0.5(2)+0.5(-1)+0.707107(3)=1.0-0.5+2.121321=2.621321$.\nQuick normalization check on the loadings: $0.5^{2}+0.5^{2}+0.707107^{2}=0.25+0.25+0.5=1.0$, so $\\phi_1$ is a valid unit loading vector.\nThe first-component score is about $2.62$.",
      "tag": "Loadings & scores"
    },
    {
      "front": "Loadings $\\phi_1=(0.707107,0.707107)^{\\top}$ and $\\phi_2=(0.707107,-0.707107)^{\\top}$. Confirm the components are **orthogonal** and find the second score for centered $(x_1,x_2)=(4,2)$.",
      "back": "Orthogonality: $\\phi_1^{\\top}\\phi_2=0.707107(0.707107)+0.707107(-0.707107)=0.5-0.5=0$. The loading vectors are orthogonal, so $Z_1$ and $Z_2$ are uncorrelated.\nSecond score: $z_2=0.707107(4)+(-0.707107)(2)=2.828428-1.414214=1.414214$.\n(For reference $z_1=0.707107(4)+0.707107(2)=4.242642$.)",
      "tag": "Loadings & scores"
    },
    {
      "front": "How do you interpret the **sign and magnitude** of a loading $\\phi_{jm}$?",
      "back": "The **magnitude** $|\\phi_{jm}|$ shows how strongly variable $X_j$ contributes to component $m$ — large loadings identify the variables that the component summarizes.\nThe **sign** shows the direction of association: variables with same-sign loadings move together along that component; opposite signs mean the component contrasts those variables.\nNote the overall sign of a whole loading vector is arbitrary — flipping all signs gives an equally valid component (scores just flip sign).",
      "tag": "Loadings & scores"
    },
    {
      "front": "What is the relationship between the **eigenvalue** $\\lambda_m$ and the variance of component $Z_m$?",
      "back": "The eigenvalue $\\lambda_m$ of the covariance/correlation matrix **is** the sample variance of the $m$-th principal-component score: $\\operatorname{Var}(Z_m)=\\lambda_m$.\nBecause components are ordered by decreasing eigenvalue, $\\lambda_1\\ge\\lambda_2\\ge\\dots\\ge\\lambda_p\\ge 0$, so $Z_1$ has the largest variance, $Z_2$ the next, and so on.",
      "tag": "Variance explained"
    },
    {
      "front": "State the **proportion of variance explained (PVE)** by the $m$-th principal component in terms of eigenvalues.",
      "back": "$\\text{PVE}_m=\\dfrac{\\lambda_m}{\\sum_{k=1}^{p}\\lambda_k}=\\dfrac{\\operatorname{Var}(Z_m)}{\\text{total variance}}$.\nThe denominator $\\sum_k\\lambda_k$ is the total variance of all the variables (the trace of the covariance/correlation matrix). Each PVE lies in $[0,1]$ and they sum to $1$ across all $p$ components.",
      "tag": "Variance explained"
    },
    {
      "front": "Why does $\\sum_{k=1}^{p}\\lambda_k$ equal the **total variance**, and what does it equal for **standardized** data?",
      "back": "The eigenvalues sum to the **trace** of the covariance/correlation matrix, which is the sum of the variables' variances — the total variance in the data.\nFor **standardized** variables PCA uses the correlation matrix, whose diagonal entries are all $1$. With $p$ variables the trace is $p$, so $\\sum_{k=1}^{p}\\lambda_k=p$. Then $\\text{PVE}_m=\\lambda_m/p$.",
      "tag": "Variance explained"
    },
    {
      "front": "A PCA on $4$ variables gives eigenvalues $\\lambda=(2.5,\\,1.0,\\,0.4,\\,0.1)$. Compute the **PVE of each component**.",
      "back": "Total variance $=2.5+1.0+0.4+0.1=4.0$ (as expected for $4$ standardized variables).\n$\\text{PVE}_1=2.5/4.0=0.625$ (62.5%).\n$\\text{PVE}_2=1.0/4.0=0.250$ (25.0%).\n$\\text{PVE}_3=0.4/4.0=0.100$ (10.0%).\n$\\text{PVE}_4=0.1/4.0=0.025$ (2.5%).\nThey sum to $1$, confirming the calculation.",
      "tag": "Variance explained"
    },
    {
      "front": "For the same eigenvalues $\\lambda=(2.5,1.0,0.4,0.1)$, find the **cumulative PVE** of the first two components.",
      "back": "Cumulative PVE of components $1$–$2$ $=\\dfrac{\\lambda_1+\\lambda_2}{\\sum_k\\lambda_k}=\\dfrac{2.5+1.0}{4.0}=\\dfrac{3.5}{4.0}=0.875$.\nSo the first two principal components together explain **87.5%** of the total variance — usually enough to justify reducing the four variables to two components.",
      "tag": "Variance explained"
    },
    {
      "front": "Eigenvalues from a $5$-variable correlation-matrix PCA are $\\lambda=(2.8,1.1,0.6,0.3,0.2)$. How many components are needed to explain at least **90%** of the variance?",
      "back": "Total $=2.8+1.1+0.6+0.3+0.2=5.0$ (matches $p=5$).\nCumulative PVE:\nPC1: $2.8/5=0.560$.\nPC1–2: $3.9/5=0.780$.\nPC1–3: $4.5/5=0.900$.\nThree components reach exactly $90\\%$, so you need the **first three** principal components.",
      "tag": "Choosing components"
    },
    {
      "front": "A covariance-matrix PCA (variables **not** standardized) gives eigenvalues $\\lambda=(40,8,2)$. Find the PVE of the first component.",
      "back": "Here the eigenvalues need not sum to $p$ — they sum to the total (unstandardized) variance.\nTotal $=40+8+2=50$.\n$\\text{PVE}_1=40/50=0.80$, i.e. the first component explains $80\\%$ of the total variance.\nThe first two together: $(40+8)/50=48/50=0.96$, or $96\\%$.",
      "tag": "Variance explained"
    },
    {
      "front": "What does a **scree plot** show, and how do you use the **\"elbow\"** to choose the number of components?",
      "back": "A scree plot graphs each component's variance explained (its PVE, or eigenvalue) against the component number, from largest to smallest.\nYou look for an **elbow** — the point after which the curve flattens and additional components add little. Keep the components before the elbow and discard the rest, since the flat tail represents mostly noise.\nIt is a visual, somewhat subjective rule; cumulative-PVE thresholds (e.g. $80$–$90\\%$) are a common complement.",
      "tag": "Choosing components"
    },
    {
      "front": "List the common rules for **deciding how many principal components to keep**.",
      "back": "1. **Cumulative-PVE threshold:** keep enough components to explain a target share (e.g. $80\\%$ or $90\\%$) of total variance.\n2. **Scree-plot elbow:** keep components up to the kink where the curve levels off.\n3. **Kaiser rule (correlation PCA):** keep components with $\\lambda_m>1$ — each must explain more than a single standardized variable.\n4. In **PCR**, choose the number of components by cross-validation against predictive error.\nThe choice is partly judgment; there is no single optimal answer.",
      "tag": "Choosing components"
    },
    {
      "front": "Under the **Kaiser (eigenvalue-greater-than-one) rule**, how many components do you keep from correlation-matrix eigenvalues $\\lambda=(2.8,1.1,0.6,0.3,0.2)$?",
      "back": "The Kaiser rule (valid for **correlation-matrix** PCA, where each variable contributes variance $1$) keeps components with $\\lambda_m>1$.\nHere $\\lambda_1=2.8>1$ and $\\lambda_2=1.1>1$, but $\\lambda_3=0.6<1$. So keep the **first two** components.\nRationale: a component with $\\lambda<1$ explains less variance than one original standardized variable, so it is not worth retaining.",
      "tag": "Choosing components"
    },
    {
      "front": "Why are variables usually **standardized** before PCA, and what changes if you do not?",
      "back": "Variance is **scale-dependent**: a variable measured in large units (e.g. dollars) has a huge raw variance and would dominate the first component purely because of its units.\nStandardizing (subtract mean, divide by SD) puts every variable on variance $1$, so PCA runs on the **correlation matrix** and components reflect genuine co-variation, not measurement scale.\nWithout standardization you do PCA on the **covariance matrix**, which is appropriate only when all variables share comparable units/scales.",
      "tag": "Scaling"
    },
    {
      "front": "When is it acceptable (or preferable) to run PCA on the **covariance matrix** without standardizing?",
      "back": "When the variables are already in the **same units and on comparable scales**, so their relative variances are meaningful and you want larger-variance variables to count more (e.g. all measurements in the same currency, or pixel intensities).\nIn that case standardizing would throw away real information about which variables vary most. Otherwise — mixed units or very different magnitudes — standardize and use the correlation matrix.",
      "tag": "Scaling"
    },
    {
      "front": "Two variables have variances $100$ and $1$ and covariance $5$. Why would an **unstandardized** PCA be misleading here, and what fixes it?",
      "back": "The covariance matrix is $\\begin{pmatrix}100 & 5\\\\ 5 & 1\\end{pmatrix}$. The first variable's variance ($100$) swamps the second's ($1$), so the leading component is almost entirely the first variable — driven by its scale, not by any real structure.\n**Fix:** standardize. The correlation is $\\rho=\\frac{5}{\\sqrt{100}\\sqrt{1}}=0.5$, giving correlation matrix $\\begin{pmatrix}1 & 0.5\\\\ 0.5 & 1\\end{pmatrix}$, on which both variables contribute equally.",
      "tag": "Scaling"
    },
    {
      "front": "For the standardized two-variable correlation matrix $\\begin{pmatrix}1 & 0.5\\\\ 0.5 & 1\\end{pmatrix}$, find the **eigenvalues** and the PVE of the first component.",
      "back": "For a $2\\times 2$ correlation matrix with off-diagonal $\\rho$, the eigenvalues are $1+\\rho$ and $1-\\rho$.\nHere $\\lambda_1=1+0.5=1.5$ and $\\lambda_2=1-0.5=0.5$.\nCheck: they sum to $2=p$. $\\text{PVE}_1=\\frac{1.5}{2}=0.75$, so the first component explains $75\\%$ of the (standardized) variance.",
      "tag": "Variance explained"
    },
    {
      "front": "For the same $2\\times 2$ correlation matrix $\\begin{pmatrix}1 & 0.5\\\\ 0.5 & 1\\end{pmatrix}$, find the **first loading vector**.",
      "back": "By symmetry the eigenvector for $\\lambda_1=1+\\rho$ is proportional to $(1,1)$. Normalizing to unit length divides by $\\sqrt{1^{2}+1^{2}}=\\sqrt{2}$:\n$\\phi_1=\\left(\\tfrac{1}{\\sqrt2},\\tfrac{1}{\\sqrt2}\\right)=(0.707107,\\,0.707107)$.\nThe second eigenvector (for $\\lambda_2=1-\\rho$) is $\\phi_2=(0.707107,\\,-0.707107)$ — orthogonal to $\\phi_1$.",
      "tag": "Loadings & scores"
    },
    {
      "front": "What is a **biplot** in PCA, and what do its two overlaid elements represent?",
      "back": "A biplot displays a PCA on a single set of axes (usually PC1 vs PC2):\n- **Points** are the observation **scores** projected onto the first two components.\n- **Arrows** (vectors) are the variable **loadings** — each original variable is drawn from the origin with coordinates given by its loadings on PC1 and PC2.\nIt lets you read clusters of observations and the variables driving the components at the same time.",
      "tag": "Loadings & scores"
    },
    {
      "front": "How do you read the **loading arrows** on a biplot?",
      "back": "An arrow's **direction** shows which component(s) a variable aligns with: an arrow pointing along the PC1 axis loads mainly on PC1.\nArrows pointing in **similar directions** correspond to positively correlated variables; arrows roughly **opposite** correspond to negatively correlated variables; arrows at right angles are roughly uncorrelated.\nAn arrow's **length** reflects how well that variable is represented by the two plotted components.",
      "tag": "Loadings & scores"
    },
    {
      "front": "What is **principal components regression (PCR)**, and how does it use PCA?",
      "back": "PCR is a regression method that first runs PCA on the predictors, then fits an **ordinary least squares** regression of the response $Y$ on the first $M$ principal-component scores $Z_1,\\dots,Z_M$ instead of on the original $p$ predictors.\nBecause the components capture most of the predictors' variation in few dimensions, PCR reduces dimensionality and **multicollinearity**, often lowering variance at the cost of a little bias.",
      "tag": "PCR & applications"
    },
    {
      "front": "What key **assumption** underlies PCR, and when can it fail?",
      "back": "PCR assumes that the directions in which the **predictors vary most** (the leading principal components) are also the directions most associated with the **response** $Y$.\nThis usually holds but is not guaranteed: a low-variance component (dropped by PCR) could still be the one that predicts $Y$. When that happens PCR performs poorly, and a supervised method like **partial least squares** — which uses $Y$ in forming components — may do better.",
      "tag": "PCR & applications"
    },
    {
      "front": "Is PCR a method that performs **feature selection**? Explain.",
      "back": "No. Each principal component is a linear combination of **all** $p$ original predictors, so even using only $M<p$ components, every original variable still enters the model through the loadings.\nPCR therefore **shrinks/regularizes** the coefficient space (like ridge regression in spirit) rather than selecting a subset of features. Methods such as the lasso, not PCR, do genuine feature selection.",
      "tag": "PCR & applications"
    },
    {
      "front": "How is the **number of components $M$** chosen in PCR, and what is the effect of moving $M$ from small to $p$?",
      "back": "$M$ is typically chosen by **cross-validation**, picking the $M$ that minimizes estimated test error.\n- Small $M$: large bias, small variance (a very reduced model).\n- $M=p$: PCR reproduces ordinary least squares on all predictors (no dimension reduction, no benefit).\nThe sweet spot is an intermediate $M$ that captures the predictive structure while discarding noisy low-variance directions.",
      "tag": "PCR & applications"
    },
    {
      "front": "Why must predictors be **standardized before PCR** (just as in PCA)?",
      "back": "PCR builds its components from a PCA of the predictors, so it inherits PCA's scale sensitivity: without standardizing, high-variance (large-unit) predictors dominate the components regardless of their relevance.\nStandardizing each predictor to mean $0$, variance $1$ ensures the components reflect correlation structure rather than units, so the regression on the scores is not distorted by an arbitrary choice of measurement scale.",
      "tag": "Scaling"
    },
    {
      "front": "A PCR model regresses $Y$ on the first two component scores: $\\hat Y=12+3Z_1-2Z_2$. The loadings are $\\phi_1=(0.6,0.8)$ and $\\phi_2=(0.8,-0.6)$. Predict $Y$ for standardized predictors $(x_1,x_2)=(1,-1)$.",
      "back": "First compute the scores from the loadings:\n$Z_1=0.6(1)+0.8(-1)=0.6-0.8=-0.2$.\n$Z_2=0.8(1)+(-0.6)(-1)=0.8+0.6=1.4$.\nThen plug into the fitted equation:\n$\\hat Y=12+3(-0.2)-2(1.4)=12-0.6-2.8=8.6$.\nThe predicted response is $8.6$.",
      "tag": "PCR & applications"
    },
    {
      "front": "Eigenvalues from a $6$-variable correlation PCA are $\\lambda=(3.0,1.5,0.6,0.5,0.3,0.1)$. Build the **cumulative-PVE table** and apply an $80\\%$ threshold.",
      "back": "Total $=3.0+1.5+0.6+0.5+0.3+0.1=6.0$ (matches $p=6$).\nPC1: $3.0/6=0.500$ → cum $0.500$.\nPC2: $1.5/6=0.250$ → cum $0.750$.\nPC3: $0.6/6=0.100$ → cum $0.850$.\nPC4: $0.5/6=0.083$ → cum $0.933$.\nThe first component to push cumulative PVE past $80\\%$ is **PC3** (cum $85.0\\%$), so keep **three** components under an $80\\%$ rule.",
      "tag": "Choosing components"
    },
    {
      "front": "A PCA reports that PC1 explains $45\\%$ and PC2 explains $30\\%$ of total variance, with a total variance of $20$. Find the **eigenvalues** $\\lambda_1$ and $\\lambda_2$.",
      "back": "Since $\\text{PVE}_m=\\lambda_m/\\sum_k\\lambda_k$ and $\\sum_k\\lambda_k=20$:\n$\\lambda_1=0.45\\times 20=9.0$.\n$\\lambda_2=0.30\\times 20=6.0$.\nThe first two components carry variance $9.0+6.0=15.0$ out of $20$, i.e. cumulative PVE $=15/20=0.75$ ($75\\%$).",
      "tag": "Variance explained"
    },
    {
      "front": "Compute the **PVE of the first component** directly from the scores: the first-component scores for $5$ observations are $z_{i1}=(-3,-1,0,1,3)$ and the total variance of all variables is $6.0$.",
      "back": "The variance explained by PC1 equals the (sample) variance of its scores. The scores have mean $0$.\nSum of squares $=(-3)^{2}+(-1)^{2}+0^{2}+1^{2}+3^{2}=9+1+0+1+9=20$.\nUsing the $1/n$ population convention, $\\operatorname{Var}(Z_1)=20/5=4.0=\\lambda_1$.\n$\\text{PVE}_1=\\dfrac{4.0}{6.0}\\approx 0.667$, so PC1 explains about $66.7\\%$ of the total variance.",
      "tag": "Variance explained"
    },
    {
      "front": "State two important **limitations** of PCA to keep in mind.",
      "back": "1. **Interpretability:** components are linear blends of all variables, so they often lack a clean real-world meaning, making results harder to explain than original-variable models.\n2. **Unsupervised:** PCA ignores any response $Y$; the highest-variance directions need not be the most predictive ones. It is also sensitive to scaling and to outliers, and it only captures **linear** structure.",
      "tag": "Principal components"
    },
    {
      "front": "Eigenvalues $\\lambda=(4.2,1.3,0.9,0.4,0.2)$ from a $5$-variable PCA. Compute the variance **lost** if you keep only the first **two** components.",
      "back": "Total variance $=4.2+1.3+0.9+0.4+0.2=7.0$.\nVariance retained by PC1–2 $=4.2+1.3=5.5$, so cumulative PVE $=5.5/7.0\\approx 0.786$ ($78.6\\%$).\nVariance **lost** (the dropped components) $=0.9+0.4+0.2=1.5$, a proportion $1.5/7.0\\approx 0.214$, i.e. about $21.4\\%$ of the total variance is discarded.",
      "tag": "Choosing components"
    },
    {
      "front": "Explain why the principal components are **orthogonal/uncorrelated** and why that property is useful.",
      "back": "The loading vectors are eigenvectors of a symmetric matrix (covariance or correlation), and eigenvectors of a symmetric matrix for distinct eigenvalues are **orthogonal**. Orthogonal loadings make the score variables **uncorrelated**: $\\operatorname{Cov}(Z_m,Z_{m'})=0$ for $m\\ne m'$.\nUsefulness: the total variance splits cleanly across components (so PVEs add up), and using scores as predictors removes multicollinearity in PCR.",
      "tag": "Principal components"
    },
    {
      "front": "A standardized $3$-variable PCA gives first loadings $\\phi_1=(0.58,0.58,0.58)$ approximately. What does an (almost) **equal-weight** first component tell you about the variables?",
      "back": "Equal positive loadings (each near $1/\\sqrt3\\approx0.577$) mean PC1 is essentially an **average / overall-size** dimension — all three variables move together, so the dominant source of variation is their common level.\nA later component with mixed signs (e.g. $(0.71,-0.71,0)$) would then represent a **contrast** between variables. This pattern (size then shape) is common when variables are strongly positively correlated.",
      "tag": "Loadings & scores"
    },
    {
      "front": "Covariance-matrix eigenvalues are $\\lambda=(60,25,10,5)$ (variables in the same units, not standardized). Find the cumulative PVE through **three** components and state how many to keep for a $90\\%$ target.",
      "back": "Total $=60+25+10+5=100$, so PVEs are conveniently the eigenvalues as percents.\nPC1: $60\\%$ → cum $60\\%$.\nPC1–2: $85\\%$ → cum $85\\%$.\nPC1–3: $95\\%$ → cum $95\\%$.\nThree components reach $95\\%\\ge 90\\%$ (two give only $85\\%$), so keep the **first three** components.",
      "tag": "Choosing components"
    },
    {
      "front": "Given first-component loadings $\\phi_1=(0.5,0.5,0.5,0.5)$ for four standardized variables, compute the PC1 scores for two observations $A=(1,1,1,1)$ and $B=(2,0,-1,1)$.",
      "back": "Normalization check: $4\\times 0.5^{2}=4(0.25)=1$, so $\\phi_1$ is a unit vector.\nScore of $A$: $0.5(1+1+1+1)=0.5(4)=2.0$.\nScore of $B$: $0.5(2+0-1+1)=0.5(2)=1.0$.\nSince all loadings are equal and positive, the PC1 score is just $0.5$ times the sum of the (standardized) variables — an overall-magnitude index.",
      "tag": "Loadings & scores"
    },
    {
      "front": "Distinguish **PCA** from **clustering** as unsupervised methods.",
      "back": "Both are unsupervised, but they answer different questions.\n**PCA** seeks a **low-dimensional representation** of the observations that captures most of the variance — it summarizes the variables/directions.\n**Clustering** seeks **subgroups** of observations that are similar to each other — it partitions the rows.\nPCA reduces dimensions (columns/directions); clustering groups observations (rows). They are often used together (e.g. cluster on the leading principal-component scores).",
      "tag": "Principal components"
    }
  ]
}