PROC in SAS

This comprehensive Q&A-style guide about PROC in SAS Software breaks down fundamental SAS PROCs used in statistical analysis and data management. Learn:
What PROCs do and their key functions.
Differences between PROC MEANS & SUMMARY.
When to use PROC MIXED for mixed-effects models.
CANCORR vs CORR for multivariate vs bivariate analysis.
Sample PROC MIXED code with required statements.
How PROC PRINT & CONTENTS help inspect data.

Ideal for students learning SAS Programming and statisticians performing advanced analyses. Includes ready-to-use code snippets and easy comparisons!

Q&A PROC in SAS Software

Explain the functions of PROC in SAS.

PROC (Procedure) is a fundamental component of SAS programming that performs specific data analysis, reporting, or data management tasks. Each PROC is a pre-built routine designed to handle different statistical, graphical, or data processing operations. The key functions of PROC in SAS are:

  • Data Analysis & Statistics: PROCs perform statistical computations, including:
    • Descriptive Statistics (PROC MEANS, PROC SUMMARY, PROC UNIVARIATE)
    • Hypothesis Testing (PROC TTEST, PROC ANOVA, PROC GLM)
    • Regression & Modeling (PROC REG, PROC LOGISTIC, PROC MIXED)
    • Multivariate Analysis (PROC FACTOR, PROC PRINCOMP, PROC DISCRIM)
  • Data Management & Manipulation
    • Sorting (PROC SORT)
    • Transposing Data (PROC TRANSPOSE)
    • Merging & Combining Datasets (PROC SQL, PROC APPEND)
  • Reporting & Output Generation
    • Printing Data (PROC PRINT)
    • Creating Summary Reports (PROC TABULATE, PROC REPORT)
    • Generating Graphs (PROC SGPLOT, PROC GCHART)
  • Quality Control & Data Exploration
    • Checking Data Structure (PROC CONTENTS)
    • Identifying Missing Data (PROC FREQ with MISSING option)
    • Sampling Data (PROC SURVEYSELECT)
  • Advanced Analytics & Machine Learning
    • Cluster Analysis (PROC CLUSTER)
    • Time Series Forecasting (PROC ARIMA)
    • Text Mining (PROC TEXTMINER)

    PROCs are the backbone of SAS programming, enabling data analysis, manipulation, and reporting with minimal coding. Choosing the right PROC depends on the task—whether it’s statistical modeling, data cleaning, or generating business reports.

    Explain the Difference Between PROC MEANS and PROC SUMMARY.

    Both PROC MEANS and PROC SUMMARY in SAS compute descriptive statistics (e.g., mean, sum, min, max), but they differ in default behavior and output:

    • Default Output
      PROC MEANS: Automatically prints results in the output window.
      PROC SUMMARY: Does not print by default; requires the PRINT option.
    • Dataset Creation
      Both can store results in a dataset using OUT=.
    • Handling of N Observations
      PROC MEANS: Includes a default N (count) statistic.
      PROC SUMMARY: Requires explicit specification of statistics.
    • Usage Context
      Use PROC MEANS for quick interactive analysis.
      Use PROC SUMMARY for programmatic, non-printed summaries.

    The PROC MEANS is more user-friendly for direct analysis, while PROC SUMMARY in SAS offers finer control for automated reporting.

    Under the PROC MEANS, there is only a subgroup that is created only when there is a BY statement that is being used, and the input data is previously well-sorted out with the help of BY variables.

    Under the PROC SUMMARY in SAS, there is a statistic that gets produced automatically for all the subgroups. It gives all sorts of information that runs together.

    Introduction to PROC in SAS Software

    What is the PROC MIXED Procedure in SAS STAT used for?

    The PROC blended system in SAS/STAT fits specific blended models. The Mixed version can allow for one-of-a-kind assets of variation in information, it allows for one-of-a-kind variances for corporations, and takes into account the correlation structure of repeated measurements.

    PROC MIXED is essential for analyzing data with correlated observations or hierarchical structures. Its flexibility in modeling random effects and covariance makes it a cornerstone of advanced statistical analysis in SAS.

    PROC MIXED is a powerful SAS procedure for fitting linear mixed-effects models, which account for both fixed and random effects in data. It is widely used for analyzing hierarchical, longitudinal, or clustered data where observations are correlated (e.g., repeated measures, multilevel data).

    What is the Difference Between CANCORR and CORR Procedures in SAS STAT?

    Both procedures analyze relationships between variables, but they serve distinct purposes:

    1. PROC CORR (Correlation Analysis): Computes simple pairwise correlations (e.g., Pearson, Spearman). It is used to examine linear associations between two or more variables or when there is no distinction between dependent/independent variables. The output from different statistical software is in the form of the correlation matrix, p-values, and descriptive statistics. The code below tests how height, weight, and age are linearly related.

      PROC CORR DATA=my_data;
      VAR height weight age;
      RUN;
    2. PROC CANCORR (Canonical Correlation Analysis): Analyzes multivariate relationships between two sets of variables. It is used to find linear combinations (canonical variables) that maximize correlation between sets.
      It is also useful for dimension reduction (e.g., linking psychological traits to behavioral measures). The output from different statistical software is Canonical correlations, coefficients, and redundancy analysis.

      PROC CANCORR DATA=my_data;
      VAR set1_var1 set1_var2; /* First variable set */
      WITH set2_var1 set2_var2; /* Second variable set */
      RUN;

    Key Differences Summary

    FeaturePROC CORRPROC CANCORR
    Analysis TypeBivariate correlationsMultivariate (set-to-set)
    VariablesSingle list (no grouping)Two distinct sets (VAR & WITH)
    Output FocusPairwise coefficients (e.g., r)Canonical correlations (ρ)
    ComplexitySimple, descriptiveAdvanced, inferential

    Write a sample program using the PROC MIXED procedure, including all the required statements

    proc mixed data=SASHELP.IRIS plots=all;
    class species;
    model petallength= /;
    run;

    Describe what PROC PRINT and PROC CONTENTS are used for.

    PROC contents displays the information about an SAS dataset, while PROC print ensures that the data is correctly read into the SAS dataset.

    1. PROC CONTENTS: Displays metadata about a SAS dataset (structure, variables, attributes). Its key uses are:
    • Check variable names, types (numeric/character), lengths, and formats.
    • Identify dataset properties (e.g., number of observations, creation date).
    • Debug data import/export issues (e.g., mismatched formats).

    The general syntax of PROC CONTENTS is

    PROC CONTENTS DATA=your_data;  
    RUN;
    1. PROC PRINT: Displays raw data from a SAS dataset to the output window. Its key uses are:
    • View actual observations and values.
    • Verify data integrity (e.g., missing values, unexpected codes).
    • Quick preview before analysis.

    The general Syntax of PROC PRINT is

    PROC PRINT DATA=your_data (OBS=10);  /* Prints first 10 rows */ 
       VAR var1 var2;                                              /* Optional: limit columns */
    RUN;

    Functions in R Programming

    Functions in SAS

    The post is about Functions in SAS Software. Functions in SAS software are predefined routines that perform specific computations or transformations on data. They can be categorized into several types based on their functionality.

    Introduction to Functions in SAS Software

    SAS functions are predefined operations that perform specific computations on data, categorized by their purpose. Numeric functions handle mathematical calculations like rounding, summing, and logarithms. Character functions manipulate text data through substring extraction, case conversion, and concatenation. Date and time functions manage SAS date, time, and datetime values, enabling operations like extracting year/month/day or shifting dates by intervals.

    In SAS, Statistical functions compute summary metrics such as mean, median, and standard deviation. Financial functions support business calculations like net present value and loan payments. Random number functions generate values from statistical distributions for simulations. Bitwise functions perform low-level binary operations. Array functions assist in managing array dimensions and bounds. Special functions include utilities for data type conversion and lagged value retrieval. Finally, file and I/O functions check file existence and manage input/output operations. Together, these functions streamline data processing, analysis, and reporting in SAS.

    Here are the main types of functions in SAS Software:

    Numeric Functions

    Perform mathematical operations on numeric values. These functions are also called arithmetic functions.

    FunctionShort Description
    SUM()Sum of arguments
    MEAN()Arithmetic mean
    MIN() / MAX()Minimum/Maximum value
    ROUND()Rounds a number
    INT()Returns integer part of a number
    ABS()Absolute value of the argument
    SQRT()Square root
    LOG() / LOG10()Returns the integer part of a number
    Functions in SAS Software

    Random Number Functions in SAS

    These functions generate random numbers.

    Random Number FunctionShort Description
    RANUNI()Generates random numbers from Uniform distribution
    RANNOR()Generates random numbers from a Normal distribution
    RANBIN()Generates random numbers from a Binomial distribution

    Financial Functions

    The following are important and useful financial calculations.

    Financial FunctionsShort Description
    IRR()Internal rate of return
    NPV()Returns Net Present Value
    PMT()Loan payment calculation

    Character Functions in SAS

    Manipulate and analyze text (string) data. These functions can also be classified as character-handling functions.

    Character FunctionsShort Description
    SUBSTR()Extracts a substring from an argument
    SCAN()Extracts a specified word from a string
    TRIM() / STRIP()Removes trailing/leading blanks from character expression
    UPCASE() / LOWCASE()Converts to uppercase/lowercase
    CATX()Concatenates strings with a delimiter
    INDEX()Finds the position of a
    COMPRESS()Removes specific characters from a string

    Statistical Functions

    The following are some important functions for the computation of descriptive statistical measures.

    Descriptive FunctionsShort Description
    MEAN(), MEDIAN(), MODE()Returns measures of central tendencies, mean, median, and mode of the data
    STD()Returns standard deviation
    VAR()Returns the variance
    N()Returns the count of non-missing values
    NMISS()Returns the count of missing values

    Date and Time Functions in SAS

    These functions handle SAS date, time, and datetime values.

    FunctionsShort Description
    TODAY() / DATE()Returns the current date
    MDY()Creates a date from month, day, year
    YEAR() / MONTH() / DAY()Extracts year/month/day
    INTCK()Computes intervals between dates
    INTNX()Increments a date by intervals
    DATEPART()Extracts the date from datetime
    TIMEPART()Extracts time from datetime

    Bitwise Functions

    The following functions perform bit-level operations.

    FunctionsShort Description
    BAND()Bitwise AND
    BOR()Bitwise OR
    BNOT()Bitwise NOT

    Array Functions

    The following functions work with arrays.

    FunctionsShort Description
    DIM()Returns the size of an array
    HBOUND() / LBOUND()Returns upper/ lower bounds of an array

    Special Functions

    Miscellaneous operations. These functions may be classified as conversion functions, too.

    FunctionsShort Description
    INPUT()Converts character to numeric/ date
    PUT()Converts value to formatted text
    LAG() / DIF()Access previous row values

    File and I/O Functions

    These functions handle file operations.

    FunctionsShort Description
    FILEEXIST()Checks if a file exists
    FEXIST()Checks if a fileref exists

    The SAS functions described above help us in data cleaning, transformation, and analysis in SAS programming/ Software.

    First Year (Intermediate) Mathematics Quiz

    Essential SAS Interview Questions

    This blog post covers essential SAS interview questions to help aspiring data analysts and SAS programmers prepare for technical interviews. It explains core concepts like the basic elements of a SAS program, creating permanent datasets, the role of the DATA step, and how SAS informats work. Each question is answered concisely with practical examples, making it a quick yet comprehensive guide for interview preparation related to SAS Programming.

    Essential SAS Interview Questions

    What are the Basic Elements needed to run a SAS Program?

    To run a SAS program, the following basic elements are needed:

    • SAS Software – Install SAS (Base SAS, SAS Studio, or SAS University Edition).
    • SAS Program – A SAS Programming Script containing:
      • DATA Step – To create or modify datasets.
      • PROC Step – To analyze or process data (e.g., PROC PRINT, PROC MEANS).
    • Input Data – Can be internal (directly in the program) or external (CSV, Excel, etc.).
    • Output – Procedure Results (logs, reports, or new datasets).
    • SAS Environment – A workspace (SAS Display Manager, SAS Studio, or Enterprise Guide).

    To run a SAS program, the following must be followed:

    • Every line/statement has a semicolon
    • Input statement
    • A data statement that defines the data set
    • A run statement
    • There must be a minimum of one space between each statement or word.

    How do you create a Permanent SAD Data Set?

    To create a permanent SAS dataset, one must:

    1. Assign a Library – Use the LIBNAME statement to link a folder where the dataset will be stored.
    2. Reference the Library – Prefix the dataset name with the library name.

    Example of Creating a Permanent Dataset in SAS

    LIBNAME mylib "C:\SAS\Data";  /* Define a library */  

    DATA mylib.permanent_data; /* Creates a permanent dataset */
    INPUT ID Name $ Age;
    DATALINES;
    1 imdad 45
    2 Usman 30
    3 Ali 24
    ;
    RUN;

    The following are key points to note

    • The dataset (permanent_data) is saved in the specified folder (C:\SAS\Data) even after the SAS session ends.
    • Without a LIBNAMESAS stores datasets temporarily in the WORK library (deleted after the session).

    To access the data for later use:

    LIBNAME mylib "C:\SAS\Data";  
    PROC PRINT DATA=mylib.permanent_data;  
    RUN;  

    What is the data step known in SAS?

    In SAS, the DATA step is a fundamental programming component used to:

    • Create or Modify Datasets – Read, transform, and manipulate data.
    • Process Raw Data – Import external files (CSV, Text, and Excel) or create data internally.
    • Perform Calculations & Conditional Logic – Using SAS functions, loops (DO-END), and IF-THEN-ELSE statements.
    • Clean & Prepare Data – Handle missing values, recode variables, merge datasets, etc.

    Key Features of the DATA Step:

    • Begins with DATA the statement (names the dataset).
    • Use INPUT to define variables.
    • Can include SET, MERGE, UPDATE, or INFILE to work with existing data.
    • Ends with RUN; (or a subsequent PROC step).

    Note that the Data Step is important because

    • The core of SAS data manipulation.
    • Used before most PROC (procedure) Steps for analysis/reporting.
    • Allows complex data transformations before analysis.

    What is a SAS Data Set?

    A SAS dataset is a structured data file used in SAS programming, organized in a table format with:

    • Rows (Observations) – Represent individual records (e.g., customers, transactions).
    • Columns (Variables) – Represent data attributes (e.g., ID, Name, Age).

    The key Features of a SAS Data Set are:

    1. Stored in Libraries
      • Temporary: WORK library (deleted after session).
      • Permanent: Saved in a user-defined library (e.g., LIBNAME mylib "C:\Data";).
    2. Two Parts:
      • Descriptor (metadata like variable names, types).
      • Data (actual values).
    3. File Extension: .sas7bdat for datasets, .sas7bcat for catalogs.

    A SAS dataset is used for Data storage, manipulation, and analysis in SAS procedures (PROC steps).

    What SAS informats are?

    SAS informats are instructions used to read raw data (for example, from files or datalines) and convert it into a SAS-readable format. They define how SAS interprets input data (numbers, dates, text, etc.).

    The key features of SAS informats are:

    • Used in INPUT statements (DATA step) or with INFILE/INFORMAT statements.
    • Syntax: INFORMAT variable_name <format>; or embedded in INPUT.
    • Common types:
      • Numeric: 8. (standard numeric), COMMA9. (with commas like 1,000).
      • Character: $10. (reads 10 characters).
      • Date/Time: DATE9. (e.g., 01JAN2023), MMDDYY10. (e.g., 01/01/2023).

    The following is an example of SAS informats.

    DATA example;  
    INPUT @1 Name $10. @12 DOB MMDDYY10. @23 Salary COMMA9.;
    DATALINES;
    Imdad 01/01/1990 50,000
    Usman 12/15/1985 75,000
    ;
    RUN;
    • $10. reads 10-character text.
    • MMDDYY10. reads dates in MM/DD/YYYY format.
    • COMMA9. reads numbers with commas (e.g., 50,000).
    Essential SAS Interview Questions

    Describe Some Common SAS Informats.

    The common SAS Informats are:

    TypeExample InformatsUsage
    Numeric8., COMMA9., PERCENT8.Reads standard, comma-separated, or percentage numbers
    Character$10., $CHAR20.Reads fixed-length text
    DateDATE9., MMDDYY10., YYMMDD10.Converts text to SAS dates
    TimeTIME8., DATETIME20.Reads time/datetime values

    Describe when to use SAS Informats.

    The SAS informats should be used when:

    • Importing external files (CSV, text).
    • Reading non-standard data (e.g., dates in different formats).
    • Converting raw text into usable SAS variables.

    Take a Test: GRE Sentence Completion

    Statistics Help Statistics for Data Science & Analytics