Correlation

Correlation:

Before entering into Linear Regression let us learn about Correlation. What is correlation? We deal with the problems with the data containing more than one variable. Let us assume that we deal with the data containing the heights of husbands (X) and the heights of wives(Y) at the time of marriage functions. Bi-variate Population is one that is made up of pairs of measurements. It deals with two variables. When we dealing with two sets of data we may raise
the following two questions.
1. Is there any association/relationship with the given two variables. If yes to what extent?
2. Is there any cause and effect relationship between the two variables. That means when one variable increases or decreases does it affect the other variable? If yes to what extent and in what direction

Scatter diagram:

In a bi-variate population you will be having a set of values like (X1 Y1),(X2 Y2), (X3 Y3),(X4 Y4) …(XnYn). You can consider the pairs of numbers as two dimensional co-ordinates and plot the points in X axis and Y axis and get a graph as shown below. This graph is called Scatter Diagram. This scatter diagram reveals two types of information
1. If the variables are related
2. If yes what is the nature of relationship

a)Upward cluster: Increasing trend

When one variable increases, the other variable also increases. Increasing Trend. Positive in the same direction

b) Downward Cluster: Decreasing/Declining

When one variable increases, the other variable decreases. Decreasing Tend. Negative in the opposite direction.Say the price of a car based on the age. As the age of car increase the price of the car will decrease.

c) Curve Linear: Scatter plot show curve linear trend

d) No Relation: Variable Y and Variable X have no relationship with each other. The data points remain in a round shape. Both are unrelated to each other.

CORRELATION ANALYSIS:

It is a statistical technique. Used to describe

the degree to which the variables are related to each other
the direction and influence of one on another. The direction could be positive or negative.
If the increase in one variable is accompanied by the proportionate increase in other then the relationship is called positive correlation.
If the increase or decrease in one variable is accompanied by the proportionate decreas or increase in other variable then the relationship is called negative correlation

Correlation between two variables: (r)

where:

Correlation methods:

Pearson Correlation
Spearman Rank Correlation
Kendall’s Tau Rank Correlation
Point-Biserial Correlation
Phi-Coefficient
Cramér’s V

These methods are used to measure a) the strength and b) the direction of the relationship between variables.

Pearson Correlation:

Description: Measures the linear relationship between two continuous variables.

Usage : can be used only when the variables are continuous in nature and normally distributed.

Range : Correlation r falls in between -1 and +1.

Strength: Simple and most widely used method. Provides linear relationship.

Limitation: This method is sensitive to outliers. Follows linearity