 # Correlational Analysis

## A nonexperimental research measuring two variables and assesses the statistical relationship between them. It’s usually used to determine the dependences (e.g. Discount-Revenue).

Correlational analysis is a type of nonexperimental research method that involves observing two variables in order to establish a statistically corresponding relationship (i.e., the correlation) between them. The aim of correlational research is to identify variables that have some sort of relationship do the extent that a change in one creates some change in the other.

In this type of design, relationships between and among a number of facts are sought and interpreted. The data, relationships, and distributions of variables are studied only. Variables are not manipulated; they are only identified and are studied as they occur in a natural setting.

THE MOST COMMON MISTAKES IN CORRELATIONAL ANALYSIS:

• Correlation-Causation Error. Assuming that correlation implies causation is wrong. Causation, by definition, implies that one event is the result of another event occurring. Correlation is a statistical measure of the (linear) relationship between two variables. If one has data on any two variables, one may compute the correlation between them. Causation, on the other hand, is much more difficult to determine. One may use correlation to establish whether two variables may be related and warrant further exploration into whether there is a valid causal relationship.

• Non-Linear Correlation Error. One area people often make mistakes when interpreting correlation is that correlation implies a linear relationship between the variables of interest. A good example is a massive X: 0,1,2,3,4,5; Y: 0,1,16,81,256,625. If one decided to compute the linear correlation (between the variables X and Y, one would find a correlation of 0.86 between these variables. However, the true relationship between two variables is y = x4, not the 0.86 that the math originally implied. Just computing a correlation coefficient may give one a false read on the true relationship between variables. You need to interpret the results correctly.

• Over-generalization Error. Another common mistake regarding correlation is to over-generalize. To people with mathematics background, levels of correlation dropping below 0.50 struggle to show a relationship. Most of them will generally not keep an explanatory variable in a modeling exercise if the correlation between it and the response variable falls below 0.40. Though, in the social and marketing sciences correlations at or above 0.25 or 0.30 are considered to show potential evidence of a relationship. People of math background usually find correlation numbers that low difficult to accept.

• Statistical Significance Error. It usually occurs when too small (non statistically significant) sample is taken. In statistical hypothesis testing the Statistical Significance Error leads to a Type I Error (also known as a "false positive" finding or conclusion) and Type II Error (also known as a "false negative" finding or conclusion). To evade the issue use the sample size calculator I coded below

## Sample Size Calculator(correlational analysis):

* MDC - Minimum Detectable Correlation is the minimum correlation coefficient you would like to be able to detect. The industry standard depends but I recommend keeping it over 0.2.

** Stat Power - satistical power (1−β) is a percent of the time the minimum effect size will be detected, assuming it exists. The industry standard is 80% but recommend keeping it 90%.

*** Significance - significance level (α) is a percent of the time a difference will be detected, assuming one does NOT exist. The industry standard is 5%.