Post Views: 91
Curious about dimensionality reduction in machine learning? This post answers key questions: What is dimension reduction? How do PCA, KPCA, and ICA work? Should you remove correlated variables before PCA? Is rotation necessary in PCA? Perfect for students, researchers, data analysts, and ML practitioners looking to master feature extraction, interpretability, and efficient modeling. Learn best practices and avoid common pitfalls about dimensionality reduction in machine learning.
What is Dimension Reduction in Machine Learning?
Dimensionality Reduction in Machine Learning is the process of reducing the number of input features (variables) in a dataset while preserving its essential structure and information. Dimensionality reduction simplifies data without losing critical patterns, making ML models more efficient and interpretable. The dimensionality reduction in machine learning is used to
- Removes Redundancy: Eliminates correlated or irrelevant features/variables
- Fights Overfitting: Simplifies models by reducing noise
- Speeds up Training: Fewer dimensions mean faster computation
- Improves Visualization: Projects data into 2D/ 3D for better understanding.
The common techniques for dimensionality reduction in machine learning are:
- PCA: Linear projection maximizing variance
- t-SNE (t-Distributed Stochastic Neighbour Embedding): Non-linear, good for visualization
- Autoencoders (Neural Networks): Learn compact representations.
- UMAP (Uniform Manifold Approximation and Projection): Preserves global & local structure.
The uses of dimensionality reduction in machine learning are:
- Image compression (for example, reducing pixel dimensions)
- Anomaly detection (by isolating key features)
- Text data (for example, topic modeling via LDA)
What are PCA, KPCA, and ICA used for?
PCA (Principal Component Analysis), KPCA (Kernel Principal Component Analysis), and ICA (Independent Component Analysis) are dimensionality reduction (feature extraction) techniques in machine learning; widely used in data analysis and signal processing.
- PCA (Principal Component Analysis): reduces dimensionality by transforming data into a set of linearly uncorrelated variables (principal components) while preserving maximum variance. Its key uses are:
- Dimensionality Reduction: Compresses high-dimensional data while retaining most information.
- Data Visualization: Projects data into 2D/3D for easier interpretation.
- Noise Reduction: Removes less significant components that may represent noise.
- Feature Extraction: Helps in reducing multicollinearity in regression/classification tasks.
- Assumptions: Linear relationships, Gaussian-distributed data.
- KPCA (Kernel Principal Component Analysis): It is a nonlinear extension of PCA using kernel methods to capture complex structures. Its key uses are:
- Nonlinear Dimensionality Reduction: Handles data with nonlinear relationships.
- Feature Extraction in High-Dimensional Spaces: Useful in image, text, and bioinformatics data.
- Pattern Recognition: Detects hidden structures in complex datasets.
- Advantage: Works well where PCA fails due to nonlinearity.
- Kernel Choices: RBF, polynomial, sigmoid, etc.
- ICA (Independent Component Analysis): It separates mixed signals into statistically independent components (blind source separation). Its key uses are:
- Signal Processing: Separating audio (cocktail party problem), EEG, fMRI signals.
- Denoising: Isolating meaningful signals from noise.
- Feature Extraction: Finding hidden factors in data.
- Assumptions: Components are statistically independent and non-Gaussian.
Note that Principal Component Analysis finds uncorrelated components, and ICA finds independent ones.
No, one should not remove correlated variables before PCA. It is because
- PCA Handles Correlation Automatically
- PCA works by transforming the data into uncorrelated principal components (PCs).
- It inherently identifies and combines correlated variables into fewer components while preserving variance.
- Removing Correlated Variables Manually Can Lose Information
- If you drop correlated variables first, you might discard useful variance that PCA could have captured.
- PCA’s strength is in summarizing correlated variables efficiently rather than requiring manual preprocessing.
- PCA Prioritizes High-Variance Directions
- Since correlated variables often share variance, PCA naturally groups them into dominant components.
- Removing them early might weaken the resulting principal components.
- When Should You Preprocess Before PCA?
- Scale Variables (if features are in different units) → PCA is sensitive to variance magnitude.
- Remove Near-Zero Variance Features (if some variables are constants).
- Handle Missing Values (PCA cannot handle NaNs directly).
Therefore, do not remove correlated variables before Principal Component Analysis; let PCA handle them. Instead, focus on standardizing data (if needed) and ensuring no missing values exist.
Discarding correlated variables has a substantial effect on PCA because, in the presence of correlated variables, the variance explained by a particular component gets inflated.
Suppose you have 3 variables in a data set, of which 2 are correlated. If you run Principal Component Analysis on this data set, the first principal component would exhibit twice the variance that it would exhibit with uncorrelated variables. Also, adding correlated variables lets PCA put more importance on those variables, which is misleading.
Is rotation necessary in PCA? If yes, why? What will happen if you do not rotate the components?
Rotation is optional but often beneficial; it improves interpretability without losing information.
Why Rotate PCA Components?
- Simplifies Interpretation
- PCA components are initially uncorrelated but may load on many variables, making them hard to explain.
- Rotation (e.g., Varimax for orthogonal rotation) forces loadings toward 0 or ±1, creating “simple structure.”
- Example: A rotated component might represent only 2-3 variables instead of many weakly loaded ones.
- Enhances Meaningful Patterns
- Unrotated components maximize variance but may mix multiple underlying factors.
- Rotation aligns components closer to true latent variables (if they exist).
- Preserves Variance Explained
- Rotation redistributes variance among components but keeps total variance unchanged.
What Happens If You Do Not Rotate?
- Harder to Interpret: Components may have many moderate loadings, making it unclear which variables dominate.
- Less Aligned with Theoretical Factors: Unrotated components are mathematically optimal (max variance) but may not match domain-specific concepts.
- No Statistical Harm: Unrotated PCA is still valid for dimensionality reduction—just less intuitive for human analysis.
When to Rotate?
- If your goal is interpretability (e.g., identifying clear feature groupings in psychology, biology, or market research). There is no need to rotate if you only care about dimension reduction (e.g., preprocessing for ML models).
Therefore, rotation (orthogonal) is necessary because it maximizes the difference between the variance captured by the component. This makes the components easier to interpret. Not to forget, that is the motive of doing Principal Component Analysis, where we aim to select fewer components (than features) which can explain the maximum variance in the dataset. By doing rotation, the relative location of the components does not change, it only changes the actual coordinates of the points. If we do not rotate the components, the effect of PCA will diminish, and we will have to select a larger number of components to explain the variance in the dataset
Rotation does not change PCA’s mathematical validity but significantly improves interpretability for human analysis. Skip it only if you are using PCA purely for algorithmic purposes (e.g., input to a classifier).
Simulation in the R Language