Covariance Correlation
Multivariate Distributions
- A multivariate random variable is a variable that consists of two or more random variables. It represents a collection of random outcomes, each corresponding to one of the component random variables.
- For example, consider a dataset containing the heights and weights of individuals. Here, height and weight are two random variables, and together they form a multivariate random variable representing each individual's height and weight pair.
- In general, a multivariate random variable can have any number of component random variables. It is often denoted as a vector, where each element of the vector represents one of the component random variables.
What is Covariance?
Covariance is a statistical term that refers to a systematic relationship between two random variables in which a change in the one reflects a change in other variable.
I. Covariance tells us the Direction
- A positive covariance indicates that as one variable increases, the other tends to increase as well.
- A covariance of zero indicates no linear relationship between the variables.
- A negative covariance indicates that as one variable increases, the other tends to decrease.
II. Measured on a scale
- The covariance value can range from is
to - The greater this number, the more reliant the relationship.
III. Formula
where
and is Total sample space
What is Correlation?
Correlation measures the relationship between two variables, indicating how one variable changes when the other does.
I. Types:
- Positive Correlation: When both variables move in same direction Eg: Study time 📈 Scores 📈
- Negative Correlation: When the variables move in opposite direction Eg: Temperature 📈 Hot beverage sales 📉
II. Measured on a scale
- The correlation values range from [-1, 1]
- Value of 1 → Perfect Positive correlation.
- Value of -1 → Perfect Negative correlation.
III. Formula
To understand why, we have to look at what each part of the formula actually does.
《 I 》Covariance tells us the Direction
Covariance measures how two variables move together.
- If
goes up and goes up, covariance is positive. - If
goes up and goes down, covariance is negative.
The Problem: Covariance is "unscaled." Its value depends entirely on the units of measurement. If you calculate the covariance of heights and weights in meters and kilograms, you get a small number. If you switch to feet and pounds, the covariance number becomes massive, even though the relationship between the people hasn't changed.
《 II 》 Standard Deviation tells us the Scale
The standard deviations (
《 III 》The Division "Normalizes" the Data
By dividing the covariance by the product of the standard deviations, we are essentially canceling out the units.
Think of it like this:
- Numerator: The joint variability of
and (Units: ). - Denominator: The individual variability of
and (Units: ).
When you divide them, the units cancel out completely, leaving you with a pure number. This process is called Normalization or Standardization.
| Parameter | Covariance | Correlation |
|---|---|---|
| Meaning | A measure of how much two random variables change together. |
A statistical measure that indicates how strongly two variables are related. |
| What is it? | Measure of Correlation | Scaled version of Covariance |
| Values | [ |
[-1, 1] |
| Change in Scale | Affects covariance | Does not effect correlation |
| Unit of Measurement | Measured in the product of the units of the two variables. |
It is a dimensionless unit (no units) |
| Goal | To find the direction of the relationship. | To find the strength and direction of the relationship. |
| Formula |