Photo by Ilze Lucero on Unsplash

Are you looking for a computationally cheap, easy-to-explain linear estimator that’s based on simple mathematics? Look no further than OLS!

OLS stands for ordinary least squares. OLS is heavily used in econometrics—a branch of economics where statistical methods are used to find the insights in economic data.

As we know, the simplest linear regression algorithm assumes that the relationship between an independent variable (x) and dependent variable (y) is of the following form: y = mx + c, which is the equation of a line.

In line with that, OLS is an estimator in which the values of m and c (from the above equation) are chosen in such a way as to minimize the sum of the squares of the differences between the observed dependent variable and predicted dependent variable. That’s why it’s named ordinary least squares.

Also, it should be noted that when the sum of the squares of the differences is minimum, the loss is also minimum—hence the prediction is better.

Photo by Jeremy Bishop on Unsplash
Join more than 11,000 of your fellow machine learners and data scientists. Subscribe to the premier newsletter for all things deep learning.

Advantages of OLS

  • OLS is easier to implement compared to other similar econometric techniques. This is because the theory of least squares is easier to understand for a developer than other common approaches.
  • OLS has a simple mathematical concept so it is easier to explain to non-technologists or stakeholders at high level.

Assumptions of OLS

  • There should be no multicollinearity between any two independent variables.
  • The value of the mean of the error terms should be zero for given independent variables.
  • The sample taken for the OLS regression model should be taken randomly from the population.
  • All the error terms in the regression should have the same variance, which means homoscedasticity.
Photo by @chairulfajar_ on Unsplash

OLS using Statsmodels

Statsmodels is part of the scientific Python library that’s inclined towards data analysis, data science, and statistics. It’s built on top of the numeric library NumPy and the scientific library SciPy.

The Statsmodels package provides different classes for linear regression, including OLS. However, linear regression is very simple and interpretative using the OLS module. We can perform regression using the sm.OLS class, where sm is alias for Statsmodels.

OLS method

The sm.OLS method takes two array-like objects a and b as input. a is generally a Pandas dataframe or a NumPy array. The shape of a is o*c, where o is the number of observations and c is the number of columns. b is generally a Pandas series of length o or a one dimensional NumPy array.

In the below code, OLS is implemented using the Statsmodels package:

OLS using Statsmodels

OLS regression results

OLS Regression results
  • R-squared is also called the coefficient of determination. It’s a statistical measure of how well the regression line fits the data.
  • Adjusted R-squared actually adjusts the statistics based on the number of independent variables present.
  • The ratio of deviation of the estimated value of a parameter from its hypothesized value to its standard error is called t-statistic.
  • F-statistic is calculated as the ratio of mean squared error of the model and mean squared error of residuals.
  • AIC stands for Akaike Information Criterion, which estimates the relative quality of statistical models for a given dataset.
  • BIC stands for Bayesian Information Criterion, which is used as a criterion for model selection among a finite set of models. BIC is like AIC, however it adds a higher penalty for models with more parameters.
Photo by Dhruv Deshmukh on Unsplash

Conclusion

Here we worked through a quick overview of OLS using Statsmodels and its implementation in a Jupyter Notebook with sample data. I hope you liked it and will give OLS a try for your regression problems.

You can find the code and the data here.

Happy Machine Learning :)

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to exploring the emerging intersection of mobile app development and machine learning. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Fritz AI, the machine learning platform that helps developers teach devices to see, hear, sense, and think. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletters (Deep Learning Weekly and Heartbeat), join us on Slack, and follow Fritz AI on Twitter for all the latest in mobile machine learning.