Google Search Tricks and Tips

Here are some of the most useful Google Search Tricks and Tips that can be used in Google from basic tips to new features. Let us start with helpful Google Search Tricks and Tips that can be useful for search queries by mathematicians and statisticians.

google search tricks and Tips

Google Search Tricks and Tips

Google Search Tricks and Tips 1: Double Quotes (for Exact Search)

The use of double quotes yields only the pages with the same words in the same order (containing a specific phrase) as what’s in the quotes.

Google Search Trick 2: Asterisk within Quotes (to Specify Unknown Words)

Searching a phrase in double quotes with an asterisk will search all variations of that phrase. For example “* matrix in regression analysis” will yield pages that have different words before and after ‘matrix in regression analysis’ such as “hat matrix in regression analysis”, “the inverse-partitioned-matrix method in linear regression analysis”, “a matrix form, in regression analysis”, and “second important matrix in regression analysis” etc.

Google Search Trick 3: Minus Sign to Exclude Words from Search

If you want to exclude (eliminate) certain words from your search, you can use the minus sign. For example, the “hat matrix in regression -outlier” will result in all the pages related to the hat matrix in regression but will not contain outlier words in the searches.

Google Search Trick 4: Tilde symbol (~) to Search for Similar Words

The tilde symbol used in the phrase will search for a word and all its synonyms. For example, ~Cross Table will result in crosstable, crosstabulation, cross-table, and cross-table query, etc.

Google Search Tricks and Tips 5: OR Operator for Multiple Words Searching

The “OR operator searches the pages that include either word before and after the OR operator. For example, residuals or error will result in the words “residuals” and either “error”.

Google Search Trick 6: Numerical Range

The use of a numerical range of numbers results in pages that match these numbers. For example, “History of Statistics 2000…2019”.

Google Search Tricks and Tips 7: Finding the Meanings of Word or Phrase

The define keyword is used to define a word or phrase. For example, define statistics, define: define: residuals or error, and define: goodness of fit test

Google Search Trick 8: Search a Particular Website

The site: function searches a particular website. For example, Learning statistics site:edu will result in pages found on .edu websites. The other examples can be “Learning statistics site: com”, and Learning statistics site:itfeature.com, etc.

Google Search Trick: Search a particular website

Google Search Trick 9: Search Webpages Linked to a Particular Website

The link: function searches for web pages that link to a particular website. For example, link: itfeature.com, link: stat.bzu.edu.pk

Google Search Trick 10: Math Answers

Google performs basic math functions for example, 4.7, 30% of 55, 20^2, sqrt(4), exp(4), log(10), cos(90), etc.

Google Search Trick 11: Unit Conversion

Converts the units of a measure. For example, 5cm in the foot, 100$ in PKR, 42 days in a fortnight, 10 mph in the speed of light, 100 miles in leagues, and 100 Km in miles, etc.

Google Search Trick 12: Compare using “vs”

A one-by-one comparison can be searched using the “vs” keyword. For example, statistics vs parameters, descriptive vs inferential statistics, AIC vs BIC, and statistics vs mathematics.

https://itfeature.com statistics help

R Faqs: Frequently Asked Questions

Online MCQs Website with Answers

Job Interview Recently Asked Questions

Multicollinearity Introduction Explained Easy (2019)

For a classical linear regression model with multiple regressors (explanatory variables), there should be no exact linear relationship between the explanatory variables. The collinearity or multicollinearity term is used if there is/are one or more linear relationship exists among the variables.

Multicollinearity Term

The term multicollinearity is considered as the violation of the assumption of “no exact linear relationship between the regressors.

Ragnar Frisch introduced this term, originally it means the existence of a “perfect” or “exact” linear relationship among some or all regressors of a regression model.

Consider a $k$-variable regression model involving explanatory variables $X_1, X_2, \cdots, X_k$. An exact linear relationship is said to exist if the following condition is satisfied.

\[\lambda_1 X_1 + \lambda_2  X_2 + \cdots + \lambda_k X_k=0,\]

where $\lambda_1, \lambda_2, \cdots, \lambda_k$ are constant and all of them are non-zero, simultaneously, and $X_1=1$ for all observations for intercept term.

Nowadays, the multicollinearity term is not only being used for the case of perfect multicollinearity but also in the case of not perfect collinearity (the case where the $X$ variables are intercorrelated but not perfectly). Therefore,

\[\lambda_1X_1 + \lambda_2X_2 + \cdots \lambda_kX_k + \upsilon_i,\]

where $\upsilon_i$ is a stochastic error term.

Multicollinearity

In the case of a perfect linear relationship (correlation coefficient will be one in this case) among explanatory variables, the parameters become indeterminate (it is impossible to obtain values for each parameter separately) and the method of least square breaks down. However, if regressors are not intercorrelated at all, the variables are called orthogonal and there is no problem concerning the estimation of coefficients.

Note that

  • Multicollinearity is not a condition that either exists or does not exist, but rather a phenomenon inherent in most relationships.
  • Multicollinearity refers to only a linear relationship among the $X$ variables. It does not rule out the non-linear relationships among them.

See use of mctest R package for diagnosing collinearity

Coefficient of Determination Formula: Quick Guide 2019

In this post, we will discuss not only the coefficient of determination formula but also the use and computation of the coefficient of determination. Coefficient of Determination as a Link between Regression and Correlation Analysis.

Coefficient of Determination $R^2$ in Statistics

The R squared ($r^2$; the square of the correlation coefficient) shows the percentage of the total variation of the dependent variable ($Y$) that can be explained by the independent (explanatory) variable ($X$). For this reason, $r^2$ (r-squared) is sometimes called the coefficient of determination.

The coefficient of Determination (R-squared is commonly used in various fields like Social Science, Finance, and Economics to evaluate the performance of the regression models. It helps the researchers to understand how well their models capture the relationship between the variables being studied.

Since

\[r=\frac{\sum x_i y_y}{\sqrt{\sum x_i^2} \sqrt{\sum y_i^2}},\]

Coefficient of Determination Formula

\begin{align*}
r^2&=\frac{(\sum x_iy_i)^2}{(\sum x_i^2)(\sum y_i^2)}=\frac{\sum \hat{y}^2}{\sum y^2}\\
&=\frac{\text{Explained Variation}}{\text{Total Variation}}
\end{align*}

where $r$ shows the degree of covariability of $X$ and $Y$. Note that the formula used here is in deviation form, that is, $x=X-\mu$ and $y=Y-\mu$.

The link of $r^2$ between regression and correlation analysis can be considered from these points.

Coefficient of Determination Formula
  • If all the observations lie on the regression line then there will be no scattered points. In other words, the total variation of variable $Y$ is explained completely by the estimated regression line, which shows that there would be no scatterness in the data points(or no unexplained variation). That is
    \[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}=0\]
    Hence, $r^2=r=1$.
  • If the regression line explains only part of the variation in variable $Y$ then there will be some explained variation, that is,
    \[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}>0\]
    then, $r^2$ will be smaller than 1.
  • If the regression line does not explain any part of the variation of variable $Y$, that is,
    \[\frac{\sum e^2}{\sum y^2}=\frac{\text{Unexplained Variation}}{\text{Total Variation}}=1\Rightarrow=\sum y^2 = \sum e^2\]
    then, $r^2=0$.

Because $r^2=1-\frac{\text{unexlained variation}}{\text{total variation}}$

Key Points about Coefficient of Determination

  • Overfitting: A model can achieve a high $R^2$ value by simply memorizing the training data, but the model might not perform well on unseen data.
  • Number of Predictors: Adding more independent variables to a model will tend to increase the $R^2$ value, but it does not necessarily mean the additional variables are statistically significant.
  • Alternative Metrics: To assess the nuance of the model fit, use other metrics like adjusted R-squared or residual analysis.

Keeping in mind the limitations of R-squared, the data analysts can use the coefficient of determination as a valuable tool to assess how well their models capture real-world relationships between variables.

Note that there are two main ways to calculate R-squared value:

  1. Squared Correlation Coefficient: R-squared is the square of the correlation coefficient ($r$) between the predicted values ($\hat{y}$) from the model and the actual values of the dependent variable ($y$).
  2. Analysis of Variance (ANOVA): R-squared can also be calculated using the ratio of the explained variance to the total variance (variance in the dependent variable).
Coefficient of determination formula

FAQs about Coefficient of Determination

  1. For a simple linear regression model, what is the link between the coefficient of correlation and the coefficient of determination?
  2. How Coefficient of Determination is interpreted?
  3. How Coefficient of determination can be obtained from the ANOVA table?
  4. How overfitting can be identified from the value of $R^2$?
  5. What are alternatives to $R^2$?
  6. What is the link between total variation, explained variation, and unexplained variation?
  7. What is the impact of adding extra/ more explanatory variables in the linear regression model?
  8. What is the link between explained and unexplained variation?
  9. Give real-life examples of coefficient of determination in which it is high enough.

Learn more about the Coefficient of Determination Formula and Definition in Statistics

https://itfeature.com

Regression Model in R Programming Language

Checking Normality of Error Term (2019)

Normality of Error Term

In multiple linear regression models, the sum of squared residuals (SSR) is divided by $n-p$ (degrees of freedom, where $n$ is the total number of observations, and $p$ is the number of the parameter in the model) is a good estimate of the error variance. In the multiple linear regression model, the residual vector is

\begin{align*}
e &=(I-H)y\\
&=(I-H)(X\beta+e)\\
&=(I-H)\varepsilon
\end{align*}

where $H$ is the hat matrix for the regression model.

Each component $e_i=\varepsilon – \sum\limits_{i=1}^n h_{ij} \varepsilon_i$. Therefore, In multiple linear regression models, the normality of the residual is not simply the normality of the error term.

Note that:

\[Cov(\mathbf{e})=(I-H)\sigma^2 (I-H)’ = (I-H)\sigma^2\]

We can write $Var(e_i)=(1-h_{ii})\sigma^2$.

If the sample size ($n$) is much larger than the number of the parameters ($p$) in the model (i.e. $n > > p$), in other words, if sample size ($n$) is large enough, $h_{ii}$ will be small as compared to 1, and $Var(e_i) \approx \sigma^2$.

In multiple regression models, a residual behaves like an error if the sample size is large. However, this is not true for a small sample size.

It is unreliable to check the normality of error term assumption using residuals from multiple linear regression models when the sample size is small.

Normality of the Error Term

Learn more about Hat matrix: Role of Hat matrix in Diagnostics of Regression Analysis.

https://itfeature.com statistics help

Learn R Programming Language