Today we will have a look at one of the fundamental concepts in ML such as Standart Deviation and Variation, Covariance and Correlation Techniques
Variance is a statistical measure of how far the numbers in a collection of numbers are scattered from the collection’s average. It tells you about the collection’s degree of dispersion. In formula it would be:
where σ2 is variance, N is the number of data/observations, X is the data set and the μ is the mean.
Let’s have a look at the implementation of variance in Python. Python provides the necessary Statistics tool to measure the variance of the specific data. variance() the function should only be used when the variance of a sample needs to be calculated.
Or it could be written in 3 lines of code using NumPy:
Standard Deviation is a measure of spread in Statistics. Simply saying, it tells us about the concentration of data around the mean value. With the high concentration of it around the mean the Standart Deviation value will be low and high if there is bad concentration. It can’t be negative. It is very much similar to variance, gives the measure of deviation whereas variance provides the squared value. A low measure of Standard Deviation indicates that the data are less spread out, whereas a high value of Standard Deviation shows that the data in a set are spread apart from their mean average values. A useful property of the standard deviation is that, unlike the variance, it is expressed in the same units as the data. For example, if data expressed in kg, SD will be also in kg.
Or it can be written in the same way as Variance using NumPy:
Correlation is one of the crucial Statistics concepts, that is very useful in ML. Correlation is a technique that determines how one variables moves/changes according to the other variable. It gives us an idea about the strength of the relationship between the two variables. In some cases, it is useful to express the data’s relationships according to the other data point. For example, if a student prepares for the midterm exam, he probably can get an excellent result.
COV(x, y) = covariance of the variables x and y
σx = sample standard deviation of variable x
σy = sample standard deviation of variable y
Xᵢ= Observation point of variable X
x̅ = Mean of all observations(X)
Yᵢ= Observation point of variable Y
ȳ = Mean of all observations(Y)
n = Number of observations
Covariance is a method used to determine how much two variables change in tandem. The unit of covariance is a product of the units of the two variables. Covariance is affected by a change in scale. The value of covariance lies between -∞ and +∞.