Statistics for Data Science & Analytics - Statistics MCQs, Software & Data Analysis

Probability Distribution Quiz 9

May 28, 2025 by Muhammad Imdad Ullah

Post Views: 800

Test your knowledge of probability distributions with this comprehensive Probability Distribution Quiz! Covering exponential, normal, gamma, binomial, and chi-square distributions, this quiz is perfect for students, researchers, statisticians, and data analysts preparing for exams or interviews. Practice key concepts like mean, variance, Z-scores, kurtosis, and distribution properties to strengthen your statistical skills. Let us start with the online Probability Distribution Quiz now.

Online Probability Distribution Quiz with Answers

If the mean of the exponential distribution is 2, then its variance is
If the mean of the exponential distribution is 2, then the sum of 10 such independent variates will follow a gamma distribution with mean
If the mean of the exponential distribution is 2, then the sum of 10 such independent variates will follow a gamma distribution with variance
Which of the following is not a characteristic of a normal distribution?
The total area under a normal distribution curve to the left of the mean is always
The tails of the normal distribution
For a normal distribution, the mean is 40 and the standard deviation is 8. The value of $Z$ for $X=52$ is
What is the area under a conditional Cumulative Density Function?
The mineral content of a particular brand of supplement pills is normally distributed, with a mean of 490 mg and a variance of 400. What is the probability that a randomly selected pill contains at least 500 mg of minerals?
If the $Z$ (standard variable) score of a value is 1.5, it means the value is
If the shape of the data is leptokurtic, then it must be
A flat peak symmetrical curve is called —————-.
Another name for the bell-shaped normal curve is ——————.
A variable whose mean is zero and variance is one is called
If the shape of the data is bell-shaped normal, which of the following statements must be true
If a random variable $Y$ is distributed as normal with mean 0 and variance equal to 1, then $Y^2$ will be distributed as
If $X$ and $Y$ are two independently distributed standard normal variables, then $X^2+Y^2$ will be distributed as ————–.
If $X$ and $Y$ are two independently distributed standard normal variables, then $\frac{X^2}{Y^2}$ will be distributed as —————-.
The sum of squares of a sequence of independent normal variates with mean $\mu$ and variance $\sigma^2$ is said to be
In a binomial distribution, if $p$, $q$, and $n$ are the probability of success, failure, and number of trials, respectively, then the variance is given by

R Frequently Asked Questions

Dimensionality Reduction in Machine Learning

May 27, 2025 by Muhammad Imdad Ullah

Post Views: 626

Curious about dimensionality reduction in machine learning? This post answers key questions: What is dimension reduction? How do PCA, KPCA, and ICA work? Should you remove correlated variables before PCA? Is rotation necessary in PCA? Perfect for students, researchers, data analysts, and ML practitioners looking to master feature extraction, interpretability, and efficient modeling. Learn best practices and avoid common pitfalls about dimensionality reduction in machine learning.

What is Dimension Reduction in Machine Learning?

Dimensionality Reduction in Machine Learning is the process of reducing the number of input features (variables) in a dataset while preserving its essential structure and information. Dimensionality reduction simplifies data without losing critical patterns, making ML models more efficient and interpretable. The dimensionality reduction in machine learning is used to

Removes Redundancy: Eliminates correlated or irrelevant features/variables
Fights Overfitting: Simplifies models by reducing noise
Speeds up Training: Fewer dimensions mean faster computation
Improves Visualization: Projects data into 2D/ 3D for better understanding.

The common techniques for dimensionality reduction in machine learning are:

PCA: Linear projection maximizing variance
t-SNE (t-Distributed Stochastic Neighbour Embedding): Non-linear, good for visualization
Autoencoders (Neural Networks): Learn compact representations.
UMAP (Uniform Manifold Approximation and Projection): Preserves global & local structure.

The uses of dimensionality reduction in machine learning are:

Image compression (for example, reducing pixel dimensions)
Anomaly detection (by isolating key features)
Text data (for example, topic modeling via LDA)

What are PCA, KPCA, and ICA used for?

PCA (Principal Component Analysis), KPCA (Kernel Principal Component Analysis), and ICA (Independent Component Analysis) are dimensionality reduction (feature extraction) techniques in machine learning; widely used in data analysis and signal processing.

PCA (Principal Component Analysis): reduces dimensionality by transforming data into a set of linearly uncorrelated variables (principal components) while preserving maximum variance. Its key uses are:
- Dimensionality Reduction: Compresses high-dimensional data while retaining most information.
- Data Visualization: Projects data into 2D/3D for easier interpretation.
- Noise Reduction: Removes less significant components that may represent noise.
- Feature Extraction: Helps in reducing multicollinearity in regression/classification tasks.
- Assumptions: Linear relationships, Gaussian-distributed data.

KPCA (Kernel Principal Component Analysis): It is a nonlinear extension of PCA using kernel methods to capture complex structures. Its key uses are:
- Nonlinear Dimensionality Reduction: Handles data with nonlinear relationships.
- Feature Extraction in High-Dimensional Spaces: Useful in image, text, and bioinformatics data.
- Pattern Recognition: Detects hidden structures in complex datasets.
- Advantage: Works well where PCA fails due to nonlinearity.
- Kernel Choices: RBF, polynomial, sigmoid, etc.

ICA (Independent Component Analysis): It separates mixed signals into statistically independent components (blind source separation). Its key uses are:
- Signal Processing: Separating audio (cocktail party problem), EEG, fMRI signals.
- Denoising: Isolating meaningful signals from noise.
- Feature Extraction: Finding hidden factors in data.
- Assumptions: Components are statistically independent and non-Gaussian.

Note that Principal Component Analysis finds uncorrelated components, and ICA finds independent ones.

Dimensionality reduction in Machine Learning

Suppose a certain dataset contains many variables, some of which are highly correlated, and you know about it. Your manager has asked you to run PCA. Would you remove correlated variables first? Why?

No, one should not remove correlated variables before PCA. It is because

PCA Handles Correlation Automatically
- PCA works by transforming the data into uncorrelated principal components (PCs).
- It inherently identifies and combines correlated variables into fewer components while preserving variance.
Removing Correlated Variables Manually Can Lose Information
- If you drop correlated variables first, you might discard useful variance that PCA could have captured.
- PCA’s strength is in summarizing correlated variables efficiently rather than requiring manual preprocessing.
PCA Prioritizes High-Variance Directions
- Since correlated variables often share variance, PCA naturally groups them into dominant components.
- Removing them early might weaken the resulting principal components.
When Should You Preprocess Before PCA?
- Scale Variables (if features are in different units) → PCA is sensitive to variance magnitude.
- Remove Near-Zero Variance Features (if some variables are constants).
- Handle Missing Values (PCA cannot handle NaNs directly).

Therefore, do not remove correlated variables before Principal Component Analysis; let PCA handle them. Instead, focus on standardizing data (if needed) and ensuring no missing values exist.

Discarding correlated variables has a substantial effect on PCA because, in the presence of correlated variables, the variance explained by a particular component gets inflated.

Suppose you have 3 variables in a data set, of which 2 are correlated. If you run Principal Component Analysis on this data set, the first principal component would exhibit twice the variance that it would exhibit with uncorrelated variables. Also, adding correlated variables lets PCA put more importance on those variables, which is misleading.

Is rotation necessary in PCA? If yes, why? What will happen if you do not rotate the components?

Rotation is optional but often beneficial; it improves interpretability without losing information.

Why Rotate PCA Components?

Simplifies Interpretation
- PCA components are initially uncorrelated but may load on many variables, making them hard to explain.
- Rotation (e.g., Varimax for orthogonal rotation) forces loadings toward 0 or ±1, creating “simple structure.”
- Example: A rotated component might represent only 2-3 variables instead of many weakly loaded ones.
Enhances Meaningful Patterns
- Unrotated components maximize variance but may mix multiple underlying factors.
- Rotation aligns components closer to true latent variables (if they exist).
Preserves Variance Explained
- Rotation redistributes variance among components but keeps total variance unchanged.

What Happens If You Do Not Rotate?

Harder to Interpret: Components may have many moderate loadings, making it unclear which variables dominate.
Less Aligned with Theoretical Factors: Unrotated components are mathematically optimal (max variance) but may not match domain-specific concepts.
No Statistical Harm: Unrotated PCA is still valid for dimensionality reduction—just less intuitive for human analysis.

When to Rotate?

If your goal is interpretability (e.g., identifying clear feature groupings in psychology, biology, or market research). There is no need to rotate if you only care about dimension reduction (e.g., preprocessing for ML models).

Therefore, rotation (orthogonal) is necessary because it maximizes the difference between the variance captured by the component. This makes the components easier to interpret. Not to forget, that is the motive of doing Principal Component Analysis, where we aim to select fewer components (than features) which can explain the maximum variance in the dataset. By doing rotation, the relative location of the components does not change, it only changes the actual coordinates of the points. If we do not rotate the components, the effect of PCA will diminish, and we will have to select a larger number of components to explain the variance in the dataset

Rotation does not change PCA’s mathematical validity but significantly improves interpretability for human analysis. Skip it only if you are using PCA purely for algorithmic purposes (e.g., input to a classifier).

Simulation in the R Language

Complement of an Event

May 25, 2025 by Muhammad Imdad Ullah

Post Views: 700

Probability is a fundamental concept in statistics used to quantify uncertainty. One of the key concepts in probability is the Complement of an event. The complement of an event provides a different perspective on computing the probabilities, that is, it is used to determine the likelihood of an event not occurring. Let us explore how the complement of an event is used for the computation of probability.

What is the Complement of an Event?

The complement of an event $E$ is denoted by $E’$, encompasses all outcomes in the sample space that are not part of event $E$. In simple terms, if event $E$ represents a specific outcome or set of outcomes, its complement represents everything else that could occur.

For example, let the event $E$ be rolling a 4 on a six-sided die; the complement of event $E$ is ($E’$) rolling a 1, 2, 3, 5, or 6.

Note that event $E$ and its complement $E’$ cover the entire sample space of the die roll.

Complement Rule: Calculating Probabilities

A pivotal property of complementary events is that the sum of their probabilities is 1 (or 100%). This is because either the event happens or it does not happen, as there are no other probabilities. It can be described as
$$P(E) + P(E’) = 1$$
This leads to the complement rule, which states that
$$P(E’)= 1- P(E)$$
It is useful when computing the probability of an event not occurring.

Examples (Finding the Complement of an Event)

Suppose the probability that today is a rainy day is 0.3. The probability of it not raining today is $$1-0.3 = 0.7$$

Similarly, the probability of rolling a 2 on a fair die is $P(E) = \frac{1}{6}$. the probability of not rolling a 2 is $P(E’)=1-\frac{1}{6} = \frac{5}{6}$.

Why use the Complement Rule?

Sometimes, calculating the probability of the complement is easier than calculating the probability of the event itself. For example,

Question: What is the probability of getting at least one head in three coin tosses?
Solution: Instead of listing all possible favourable outcomes, one can easily use the complement rule. That is,
Complement Event: Getting no heads (all tails)
Probability of all tails = $\left(\frac{1}{2}\right)^3 = \frac{1}{8}$. Therefore, the probability of at least one head is

P(At least one head) = $1 – \frac{1}{8} = \frac{7}{8}$
This approach is quicker than counting all possible cases; that is, one can avoid enumerating all the favourable outcomes.

Properties of Complementary Events

Mutually Exclusive: An event and its complement cannot occur together (simultaneously)
Collectively Exhaustive: An event and its complement encompass all possible outcomes
Probability Sum: The probabilities of an event and its complement add up to 1.

Understanding complements in probability can make complex problems much simpler and easier.

Practical Applications

Understanding complements is invaluable in various fields:

Quality Control: Determining the probability of defects in manufacturing
Everyday Decisions: Estimating probabilities in daily life, such as the chance of missing a bus or the likelihood of rain.
Game Theory: Calculating chances of winning or losing scenarios
Risk Assessment: Evaluating the likelihood of adverse events not occurring

More Examples (Complement of an Event)

In a standard 52-card deck, what is the probability of not drawing a heart card?
$P(Not\,\,Heart) = 1 – P(Heart) = 1 – \frac{13}{52} = \frac{39}{52}$
If the probability of passing an examination is 0.85, what is the probability of failing it?
$P(Fail) = 1 – P(Pass) = 1 – 0.85 = 0.15$
If the probability that a flight will be delayed is 0.13, then the probability that it will not be delayed will be $1 – 0.13 = 0.87$
If $k$ is the event of drawing a king card from a well-shuffled 52-card deck, then the event $K’$ is the event that a king is not drawn, so $K’$ will contain 48 possible outcomes.

Data Analysis in the R Programming Language

Probability Distribution Quiz 9

Online Probability Distribution Quiz with Answers

Dimensionality Reduction in Machine Learning

Table of Contents