Muhammad Imdad Ullah - Statistics for Data Science & Analytics

Online Generative AI MCQs 9

Jun 10, 2025Jun 9, 2025 by Muhammad Imdad Ullah

Test your AI knowledge with Generative AI MCQs covering SQL query generation, data visualization, storytelling tools, bias detection, data augmentation, and ethical AI practices. Perfect for interviews, exams, and upskilling in AI, ChatGPT, and machine learning. Boost your expertise today! Ace exams, interviews, and AI projects with practical quiz questions! Let us start with the Online Generative AI MCQs now.

Online Generative AI MCQs with Answers

Which of the following SQL queries will be generated if you type “What is the count of rows in the Boston_house_price data set” in the prompt box of a generative AI SQL query tool?
Which is the most suitable prompt to generate a bar chart showing the average sales by area?
Which tool can create graphs and charts with GPT-4 through specialized prompt engineering?
You want to use a storytelling tool that enables you to use conversational prompts to enhance content creation and refine narrative elements, such as titles, headings, bullet points, introductions, and conclusions. Which tool would be most appropriate?
Which aspect of storytelling helps to give your narrative a clear and consistent structure?
What is a key consideration under model considerations?
Which technique can be used to check for biased loan approvals in the finance industry?
Which feature of ChatGPT is used in dashboard creation?
Which tool automatically cleans the inconsistencies and offers auto-analysis and customizable dashboards with GDPR/PDPA-compliant features?
Which storytelling tool converts text-based documents, PDFs, or URLs into slides?
Which aspect of storytelling helps to explain the relevance of data to the goals?
Which ethical technique should data scientists use to check the data for potential biases in the finance industry?
Training and running generative AI models can be computationally expensive, requiring specialized hardware and software infrastructure. Under which category does this challenge come?
Which generative AI tool uses data augmentation techniques to improve machine learning models’ performance?
How does generative AI enhance the extraction of insights from data?
Which of the following can enhance data preparation with generative AI?
Which of the following is the definition of ‘data augmentation’?
Which generative AI tool can augment structured data sets?
Which prompt can you use for the following query: SELECT AVG(AGE) FROM Boston_house_price
Which of the following tools can be used to augment image data sets by generating high-resolution, realistic images?

Special Values in R Language

Feature Selection in Machine Learning

Jun 2, 2025 by Muhammad Imdad Ullah

Learn key strategies for feature selection in machine learning with this comprehensive Q&A guide. Discover methods to identify important variables, handle datasets with >30% missing values, and tackle high-dimensional data ($p > n$). Understand why OLS fails and explore better alternatives like Lasso, XGBoost, and imputation techniques. Perfect for data scientists optimizing machine learning models! Let us start with feature selection in machine learning.

Feature Selection in Machine Learning

While working on a dataset, how do you select important variables? Discuss the methods.

Selecting important variables (feature selection in machine learning) is crucial for improving model performance, reducing overfitting, and enhancing interpretability. The following are the methods for feature selection in machine learning algorithms that can be used:

Filter Methods: Uses statistical measures to score features independently of the model.

Correlation (Pearson, Spearman): Select features/variables highly correlated with the target/ dependent variable.
Variance Threshold: Removes low-variance features.
Chi-square Test: For categorical target variables.
Mutual Information: Measures dependency between features and target.

Wrapper Method: Uses model performance to select the best subset of features.

Forward Selection: Starts with no features, adds one by one.
Backward Elimination: Starts with all features, removes the least significant.
Recursive Feature Elimination (RFE): Iteratively removes weakest features.

Embedded Methods: Feature selection is built into the model training process.

Lasso (L1 Regularization): Penalizes less important features to zero.
Decision Trees/Random Forest: Feature importance scores based on splits.
XGBoost/LightGBM: Built-in feature importance metrics.

Dimensionality Reduction: Transforms features into a lower-dimensional space.

PCA (Principal Component Analysis): Projects data into uncorrelated components.
t-SNE, UMAP: Non-linear techniques for visualization and feature reduction.

Hybrid Methods: Combines filter and wrapper methods for better efficiency (e.g., feature importance + RFE).

Suppose a dataset consisting of variables having more than 30% missing values? Out of 50 variables, 8 variables have missing values higher than 30%. How will you deal with them?

When dealing with variables that have more than 30% missing values, one needs a systematic approach to decide whether to keep, impute, or drop them.

Step 1: Analyze Missingness Pattern

Check if missingness is random (MCAR, MAR) or systematic (MNAR).
Use visualization (e.g., missingno matrix in Python) to detect patterns.

Step 2: Evaluate Variable Importance

If the variable is critical (a key predictor), consider imputation (but be cautious, as >30% missing data can introduce bias).
If the variable is unimportant, drop it to avoid noise and complexity.

Step 3: Handling Strategies

Option 1: Drop the Variables
- If the variables are not strongly correlated with the target.
- If domain knowledge suggests they are irrelevant.

The pros of dropping the variables simplify the dataset, avoid imputation bias. The cons of dropping the variables include, potential loss of useful information.

Option 2: Imputation
- For numerical variables:
  - Median/Mean imputation (if distribution is not skewed).
  - Predictive imputation (e.g., KNN, regression, or MICE).
- For categorical variables:
  - Mode imputation (most frequent category).
  - “Missing” as a new category (if missingness is informative).

The pros of imputation retain data, useful if the variable is important. The cons of imputation can distort relationships if missingness is not random.

Option 3: Flag Missingness & Impute
- Create a binary flag variable (1 if missing, 0 otherwise).
- Then impute the missing values (e.g., median/mode).
- Useful when missingness is informative (e.g., MNAR).

Option 4: Advanced Techniques
- Multiple Imputation (MICE): Good for MAR (Missing at Random).
- Model-based imputation (XGBoost, MissForest).
- Deep learning methods (e.g., GAIN, Autoencoders).

Step 4: Validate the Impact

Compare model performance before & after handling missing data.
Check if the imputation introduces bias.

Note that for feature selection in machine learning:

Drop if variables are not important (>30% missing is risky for imputation).
Impute + Flag if the variable is critical and missingness is meaningful.
Use advanced imputation (MICE, MissForest) if the data is MAR/MNAR.

You got a dataset to work with, having $p$ (number of variables) > $n$ (number of observations). Why is OLS a bad option to work with? Which techniques would be best to use? Why?

When the number of variables ($p$) exceeds the number of observations ($n$), OLS regression fails because:

No Unique Solution
- OLS requires solving $(X^TX)^{−1}X^Ty$, but when $p>n$, $X^TX$ is singular (non-invertible).
- Infinite possible solutions exist, leading to overfitting.
High Variance & Overfitting: With too many predictors, OLS fits noise rather than true patterns, making predictions unreliable.
Multicollinearity Issues: Many predictors are often correlated, making coefficient estimates unstable.

In summary, in high-dimensional data sets, one cannot use classical regression techniques, since their assumptions tend to fail. When $p > n$, we can no longer calculate a unique least squares coefficient estimate; the variances become infinite, so OLS cannot be used at all.

Feature Selection in Machine Learning variable selection missing observations

Better Techniques for $p>n$ Problems

Regularized Regression (Shrinkage Methods): They penalize large coefficients, reducing overfitting.
- Ridge Regression (L2 Penalty): Shrinks coefficients but never to zero, and works well when many predictors are relevant.
- Lasso Regression (L1 Penalty): Forces some coefficients to exactly zero, performing feature selection. Lasso regression is best when only a few predictors matter.
- Elastic Net (L1 + L2 Penalty): Combines Ridge & Lasso advantages. It is useful when there are correlated predictors.
Dimension Reduction Techniques: They reduce pp by transforming variables into a lower-dimensional space.
- Principal Component Regression (PCR): Uses PCA to reduce dimensions before regression.
- Partial Least Squares (PLS): Like PCR, but considers the target variable in projection.
Tree-Based & Ensemble Methods: They handle high-dimensional data well by selecting important features.
- Random Forest / XGBoost / LightGBM: Automatically perform feature selection and are robust to multicollinearity.
Bayesian Methods: Bayesian Ridge Regression uses priors to stabilize coefficient estimates.
Sparse Regression Techniques:
- Stepwise Regression (Forward/Backward): Iteratively selects the best subset of features.
- Least Angle Regression (LARS): Efficiently handles high-dimensional data.

Which Method is Best for variable selection in machine learning?

Scenario	Best Technique	Reason
Most features are relevant	Ridge Regression	Prevents overfitting without eliminating variables
Only a few features matter	Lasso / Elastic Net	Performs feature selection
Highly correlated features	Elastic Net / PCR / PLS	Handles multicollinearity
Non-linear relationships	Random Forest / XGBoost	Captures complex patterns
Interpretability needed	Lasso + Stability Selection	Identifies key predictors

Learn about How to Save Data in R Language

MS Excel Dashboard MCQs 14

Jun 1, 2025May 31, 2025 by Muhammad Imdad Ullah

Are you an MS Excel pro or just getting started with dashboard creation? This 35-question MS Excel Dashboard MCQs Quiz will test your knowledge on MS Excel Dashboards. The MS Excel Dashboard MCQs Test covers the topics:

✅ Dashboard Design Principles (layout, colors, interactivity)
✅ PivotTables & Pivot Charts (dynamic data representation)
✅ Slicers & Filters (making dashboards interactive)
✅ Charts & Visualizations (best chart types for trends, KPIs)
✅ Conditional Formatting & Excel Tables (keeping data updated)
✅ Macros & Hyperlinks (automating dashboard actions)

Who Should Take This MS Excel Dashboard MCQs Quiz?
✔ Data Analysts looking to refine dashboard skills
✔ Excel Users who want to build professional reports
✔ Students & Professionals preparing for interviews
✔ Anyone who loves Excel challenges!

Please go to MS Excel Dashboard MCQs 14 to view the test

Online MS Excel Dashboard MCQs Quiz With Answers

Online MS Excel Dashboard MCQs with Answers

What are some good dashboard design principles?
What are some usual considerations regarding the size of a dashboard?
If your cells in Excel are white with a thin grey outline, one quick way to create a dashboard with a white background is to turn off the gridlines.
Coloured shapes are more useful placeholders of information in dashboards than colouring the background of cells, as shapes are not restricted to the column width and row height proportions.
When using an Align tool, such as Align Right, how does Excel determine the alignment of the elements we have highlighted?
You can place a chart into a dashboard by going to our PivotChart Design tab and selecting Move Chart.
It is important to edit or format your chart before you move into your dashboard.
One difference between Pivot Charts and normal charts is that you cannot edit a Pivot Chart to be linked to a different data set.
You can link a regular chart to a PivotTable.
If you want to cut/copy, and paste multiple elements, a useful tool is
Dragging the fill handle down to create multiple charts from one that you have already created is possible if you are using
One way to add interactivity to your dashboards is by adding
What will copying and pasting a shape to one of the columns or bars in a chart do?
It is not good practice to use Pivot Charts based on calculated fields to a dashboard
To create a slicer, you need to select
The number of slicers that Excel will allow us to create depends on the
To make your dashboard interactive, each element of your dashboard will need to have its own slicer.
The slicer style has to be the same style as you have chosen for your Excel worksheets to appear. For example, if your worksheet windows have grey outlines, so will your slicer.
If you want to link your slicer to multiple dashboard elements, you should go to:
How can we choose not to display the Column letters: “A, B, C…” and Row numbers “1, 2, 3…”?
Suppose that you place a shape to cover an entire chart. Now, what would happen if you changed the outline and fill to No Outline and No Fill, and you activate a link/hyperlink for the shape?
It is best practice to add a refresh button to a dashboard that contains pivots because
Once a dashboard has been created, it is best practice to change colours using Colors rather than Themes because:
When assigning macros to a dashboard, each macro can relate to only one element of the dashboard.
What is an Excel Dashboard?
Which Excel feature is most commonly used to create interactive dashboards?
What type of chart is best for showing trends over time in a dashboard?
Which tool in Excel allows users to filter dashboard data interactively?
What is the purpose of using Conditional Formatting in a dashboard?
Which Excel function is useful for summarizing data in dashboards?
What does a KPI (Key Performance Indicator) represent in a dashboard?
Which feature helps in creating mini-charts within a cell for dashboards?
Why is it important to avoid clutter in an Excel dashboard?
Which tool connects a dashboard to external data sources?
What is the primary benefit of using Excel Tables (Ctrl + T) as a data source for dashboards?

Summarizing Data in R Base Package

Online Generative AI MCQs with Answers

Share this:

Table of Contents

Feature Selection in Machine Learning

While working on a dataset, how do you select important variables? Discuss the methods.

Suppose a dataset consisting of variables having more than 30% missing values? Out of 50 variables, 8 variables have missing values higher than 30%. How will you deal with them?

You got a dataset to work with, having $p$ (number of variables) > $n$ (number of observations). Why is OLS a bad option to work with? Which techniques would be best to use? Why?

Share this:

Online MS Excel Dashboard MCQs with Answers

Share this: