Statistics for Data Science & Analytics - MCQs, Software & Data Analysis

Essential SAS Interview Questions

Apr 28, 2025Apr 26, 2025 by Muhammad Imdad Ullah

Post Views: 935

This blog post covers essential SAS interview questions to help aspiring data analysts and SAS programmers prepare for technical interviews. It explains core concepts like the basic elements of a SAS program, creating permanent datasets, the role of the DATA step, and how SAS informats work. Each question is answered concisely with practical examples, making it a quick yet comprehensive guide for interview preparation related to SAS Programming.

Essential SAS Interview Questions

What are the Basic Elements needed to run a SAS Program?

To run a SAS program, the following basic elements are needed:

SAS Software – Install SAS (Base SAS, SAS Studio, or SAS University Edition).
SAS Program – A SAS Programming Script containing:
- DATA Step – To create or modify datasets.
- PROC Step – To analyze or process data (e.g., PROC PRINT, PROC MEANS).
Input Data – Can be internal (directly in the program) or external (CSV, Excel, etc.).
Output – Procedure Results (logs, reports, or new datasets).
SAS Environment – A workspace (SAS Display Manager, SAS Studio, or Enterprise Guide).

To run a SAS program, the following must be followed:

Every line/statement has a semicolon
Input statement
A data statement that defines the data set
A run statement
There must be a minimum of one space between each statement or word.

How do you create a Permanent SAD Data Set?

To create a permanent SAS dataset, one must:

Assign a Library – Use the LIBNAME statement to link a folder where the dataset will be stored.
Reference the Library – Prefix the dataset name with the library name.

Example of Creating a Permanent Dataset in SAS

LIBNAME mylib "C:\SAS\Data";  /* Define a library */  

DATA mylib.permanent_data;  /* Creates a permanent dataset */  
   INPUT ID Name $ Age;  
   DATALINES;  
1 imdad 45  
2 Usman 30  
3 Ali 24
;  
RUN;

The following are key points to note

The dataset (permanent_data) is saved in the specified folder (C:\SAS\Data) even after the SAS session ends.
Without a LIBNAMESAS stores datasets temporarily in the WORK library (deleted after the session).

To access the data for later use:

LIBNAME mylib "C:\SAS\Data";  
PROC PRINT DATA=mylib.permanent_data;  
RUN;

What is the data step known in SAS?

In SAS, the DATA step is a fundamental programming component used to:

Create or Modify Datasets – Read, transform, and manipulate data.
Process Raw Data – Import external files (CSV, Text, and Excel) or create data internally.
Perform Calculations & Conditional Logic – Using SAS functions, loops (DO-END), and IF-THEN-ELSE statements.
Clean & Prepare Data – Handle missing values, recode variables, merge datasets, etc.

Key Features of the DATA Step:

Begins with DATA the statement (names the dataset).
Use INPUT to define variables.
Can include SET, MERGE, UPDATE, or INFILE to work with existing data.
Ends with RUN; (or a subsequent PROC step).

Note that the Data Step is important because

The core of SAS data manipulation.
Used before most PROC (procedure) Steps for analysis/reporting.
Allows complex data transformations before analysis.

What is a SAS Data Set?

A SAS dataset is a structured data file used in SAS programming, organized in a table format with:

Rows (Observations) – Represent individual records (e.g., customers, transactions).
Columns (Variables) – Represent data attributes (e.g., ID, Name, Age).

The key Features of a SAS Data Set are:

Stored in Libraries –
- Temporary: WORK library (deleted after session).
- Permanent: Saved in a user-defined library (e.g., LIBNAME mylib "C:\Data";).
Two Parts:
- Descriptor (metadata like variable names, types).
- Data (actual values).
File Extension: .sas7bdat for datasets, .sas7bcat for catalogs.

A SAS dataset is used for Data storage, manipulation, and analysis in SAS procedures (PROC steps).

What SAS informats are?

SAS informats are instructions used to read raw data (for example, from files or datalines) and convert it into a SAS-readable format. They define how SAS interprets input data (numbers, dates, text, etc.).

The key features of SAS informats are:

Used in INPUT statements (DATA step) or with INFILE/INFORMAT statements.
Syntax: INFORMAT variable_name <format>; or embedded in INPUT.
Common types:
- Numeric: 8. (standard numeric), COMMA9. (with commas like 1,000).
- Character: $10. (reads 10 characters).
- Date/Time: DATE9. (e.g., 01JAN2023), MMDDYY10. (e.g., 01/01/2023).

The following is an example of SAS informats.

DATA example;  
   INPUT @1 Name $10.  @12 DOB MMDDYY10. @23 Salary COMMA9.;  
   DATALINES;  
Imdad      01/01/1990 50,000  
Usman     12/15/1985 75,000  
;  
RUN;

$10. reads 10-character text.
MMDDYY10. reads dates in MM/DD/YYYY format.
COMMA9. reads numbers with commas (e.g., 50,000).

Describe Some Common SAS Informats.

The common SAS Informats are:

Type	Example Informats	Usage
Numeric	`8.`, `COMMA9.`, `PERCENT8.`	Reads standard, comma-separated, or percentage numbers
Character	`$10.`, `$CHAR20.`	Reads fixed-length text
Date	`DATE9.`, `MMDDYY10.`, `YYMMDD10.`	Converts text to SAS dates
Time	`TIME8.`, `DATETIME20.`	Reads time/datetime values

Describe when to use SAS Informats.

The SAS informats should be used when:

Importing external files (CSV, text).
Reading non-standard data (e.g., dates in different formats).
Converting raw text into usable SAS variables.

Take a Test: GRE Sentence Completion

Machine Learning Interview Questions

Apr 24, 2025 by Muhammad Imdad Ullah

Post Views: 945

Prepare for your next ML interview with these essential machine learning interview questions! Learn key concepts like training vs. test sets, popular algorithms (Linear Regression, SVM, Random Forest), classifiers, and model selection. Understand why data splitting matters and see real-world examples. Perfect for aspiring data scientists and ML engineers—boost your knowledge and ace your interview.

Machine Learning Interview Questions

Mastering machine learning interview questions is crucial for landing top AI/ML roles. These questions test your fundamental understanding of key concepts like algorithms, model evaluation, and real-world problem-solving. By preparing targeted ML interview questions, candidates demonstrate technical expertise, analytical thinking, and the ability to apply theory to practical scenarios – exactly what hiring managers seek in data science and machine learning roles

What is machine learning?

Machine learning is a branch of computer science that deals with system programming to automatically learn and improve with experience. For example, Robots are programmed to perform tasks based on data they gather from sensors. They automatically learn programs from data.

In other words, Machine Learning (ML) is a subset of artificial intelligence (AI) that enables systems to learn from data and improve their performance without explicit programming. Instead of following fixed rules, ML algorithms identify patterns, make predictions, or take actions based on training data.

What are the Key Points of machine learning?

The key points of machine learning are:

Learns from Data: Improves accuracy over time with more input.
Automates Decisions: Used in recommendations, fraud detection, speech recognition, etc.
Types: Supervised (labeled data), Unsupervised (no labels), Reinforcement (trial & error).

What are “Training Set” and “Test Set”?

In machine learning, the training set and test set are defined as follows:

Training Set: The portion of data used to train a machine learning model. The model learns patterns from this data.
Test Set: A separate portion of data used to evaluate the model’s performance after training. It checks how well the model generalizes to unseen data.

Training Set: In various areas of information science, like machine learning, a dataset is used to discover the potentially predictive relationship known as the ‘Training Set’. The training set is an example given to the learner, while the Test set is used to test the accuracy of the hypotheses generated by the learner, and it is the set of examples held back from the learner. Training sets are distinct from the Test sets.

For example, suppose you have 1,000 data points; you might use 800 for training and 200 for testing.

Why Split Data in machine learning algorithms?

In different machine learning algorithms, the data is split into:

Prevents overfitting (memorizing training data instead of learning useful patterns).
Measures real-world accuracy before deployment.

Name Five Popular Algorithms of Machine Learning

The five popular algorithms of machine learning are:

Linear Regression: Used for predicting continuous values and fits a straight line to the data.
Logistic Regression: Used for binary classification (such as spam detection) and predicts probabilities between 0 and 1.
Decision Trees: Works for classification and regression (such as load approval) and splits data into branches based on feature values.
Random Forest: An ensemble method (multiple decision trees combined) that reduces overfitting and improves accuracy.
Support Vector Machine: Effective for classification tasks (such as image recognition) and finds the best boundary (hyperplane) between classes.
Neural Networks: deep learning for complex patterns
K-Nearest Neighbour (KNN): simple, instance-based learning

What is a classifier in machine learning?

A classifier in machine learning is an algorithm that assigns a label or category to input data based on its features. It is used in supervised learning where the model is trained on labeled data to predict discrete outcomes (classes).

What are the key points of a classifier in machine learning?

The key points are:

Purpose: Categorizes data (e.g., spam vs. not spam, cat vs. dog).
Examples of Classifiers:
- Logistic Regression
- Decision Trees
- Random Forest
- Support Vector Machines (SVM)
- Neural Networks
Works by: Learning patterns from labeled training data, then predicting labels for new, unseen data.

Give an example that explains the concept of a classifier in machine learning

An email classifier predicts whether an incoming email is “spam” or “not spam.”

What is Model Selection in Machine Learning?

The process of selecting models among different mathematical models, which are used to describe the same data set, is known as Model Selection. Model selection is applied to the fields of statistics, machine learning, and data mining.

Model selection is the process of choosing the best-performing algorithm (or model) for a given dataset and problem. It involves comparing different models, tuning their parameters, and selecting the one that generalizes well to unseen data.

The key aspects of model selection in machine learning are:

Performance Comparison – Evaluating models using metrics (e.g., accuracy, precision, F1-score).
Cross-Validation – Testing models on different subsets of data to ensure reliability.
Bias-Variance Tradeoff – Balancing underfitting (too simple) vs. overfitting (too complex).
Hyperparameter Tuning – Optimizing model settings for better performance.

For example, choosing between a Random Forest and an SVM for a classification task based on cross-validation scores.

Experimental Design Quiz 12

Apr 22, 2025Apr 22, 2025 by Muhammad Imdad Ullah

Post Views: 1,164

Think you know your multiple comparison tests? Take this Experimental Design Quiz to assess your understanding of Tukey’s Test, Scheffé’s Test, Fisher’s LSD Test, Duncan’s Multiple Range Test, Dunnett’s Test, and Stepwise Multiple Comparisons.

These post-hoc tests are essential in ANOVA to identify significant differences between treatment means while controlling Type I error. Whether you are a statistics student, researcher, or data analyst, this Experimental Design Quiz will challenge your grasp of treatment comparisons, statistical significance, and hypothesis testing.

Online Experimental Design Quiz with Answers

Topics Covered in this Experimental Design Quiz are:

Tukey’s HSD Test
Scheffé’s Method
Fisher’s Least Significant Difference (LSD) Test
Duncan’s Multiple Range Test
Dunnett’s Test for Control Comparisons
Stepwise Multiple Comparison Procedures

Ready to test your skills? How well you understand statistical comparisons in experimental design!, take the quiz now.

Online Experimental Design Quiz with Answers

What characteristic of an experiment is missing from a quasi-experimental design?
A study with random assignment can conclude that the explanatory variables caused the response variable.
Which of the following can increase the rigor of a quasi-experimental study?
Tukey’s Test is used as a:
Scheffes Test is a statistical test that is used to make —————– comparisons among the treatment means.
Scheffes method is more useful when we want to compare:
Scheffes method uses:
LSD test is one of the multiple comparison tests which are useful when we are interested in comparing:
LSD is the extension of:
LSD test uses:
Tukey’s test procedure is based on:
Tukey’s test deals with ————— means regardless of how many means are in the group:
Tukey’s test uses:
In order to apply Duncan’s Multiple Range (DMR) Test we have to:
Duncan’s Multiple Range (DMR) Test is used compare:
DMR test statistic uses:
If the analyst is interested in comparing each of the treatment with the control we may choose:
In Dunnet’s Test we use:
Dunnet’s Test uses difference of treatment mean and the mean of:
A stepwise multiple comparisons procedure used to identify sample means that are significantly different from each other is:

Python for Beginners

Essential SAS Interview Questions

Table of Contents

Essential SAS Interview Questions

What are the Basic Elements needed to run a SAS Program?

How do you create a Permanent SAD Data Set?

Example of Creating a Permanent Dataset in SAS

What is the data step known in SAS?

Key Features of the DATA Step:

What is a SAS Data Set?

What SAS informats are?

Describe Some Common SAS Informats.

Describe when to use SAS Informats.

Machine Learning Interview Questions

Machine Learning Interview Questions

Table of Contents

What is machine learning?

What are the Key Points of machine learning?

What are “Training Set” and “Test Set”?

Why Split Data in machine learning algorithms?

Name Five Popular Algorithms of Machine Learning

What is a classifier in machine learning?

What are the key points of a classifier in machine learning?

Give an example that explains the concept of a classifier in machine learning

What is Model Selection in Machine Learning?

Experimental Design Quiz 12

Online Experimental Design Quiz with Answers

Table of Contents

Essential SAS Interview Questions

What are the Basic Elements needed to run a SAS Program?

How do you create a Permanent SAD Data Set?

Example of Creating a Permanent Dataset in SAS

What is the data step known in SAS?

Key Features of the DATA Step:

What is a SAS Data Set?

What SAS informats are?

Describe Some Common SAS Informats.

Describe when to use SAS Informats.

Share this:

Machine Learning Interview Questions

Table of Contents

What is machine learning?

What are the Key Points of machine learning?

What are “Training Set” and “Test Set”?

Why Split Data in machine learning algorithms?

Name Five Popular Algorithms of Machine Learning

What is a classifier in machine learning?

What are the key points of a classifier in machine learning?

Give an example that explains the concept of a classifier in machine learning

What is Model Selection in Machine Learning?

Share this:

Online Experimental Design Quiz with Answers

Share this: