Learn with examples and projects
It is very important to learn statistics well for data scientists. Learning visualization tools and data manipulation tools are great! But without the knowledge of statistics, it is not possible to infer some real information from the data.
I wrote several tutorials on different inferential statistics topics. Now I realized that if I combine them together it will become a nice course for learners. Also, each of the articles works on a project with a dataset except the first article. So, learning by doing a project is a great way of learning.
At the same time, you will find some solid examples to make your own portfolio.
Here I will start with the basics and move towards a more complicated topic.
I suggest, please follow the sequence. Later articles assume that you know the basics.
Descriptive statistics includes basic ideas like mean, median, standard deviation, minimum, maximum, Inter Quartile range, and so on. The following article explains all these concepts in detail and more. I suggest go through this article before you dive deeper:
A probability distribution is one of the bases for statistical testing methods, some predictive models, and machine learning algorithms. There are so many different types of probability distributions available. Different probability distribution works in different conditions. The following article explains some commonly used discrete probability distributions such as Uniform, Binomial, Hypergeometric, Geometric, Negative Binomial, Poisson and some continuous probability distribution such as Uniform, Normal, Exponential distribution with formulas, R functions, and examples of use cases:
Confidence Interval and Hypothesis Testing
To conduct research or study about a population, we do not afford to study the whole population. Most of the time, a sample is taken and we infer the parameters like population mean or population proportion about the total population from the sample. For this purpose, confidence interval and hypothesis testing procedures have been being used for a long time. It is much easier to perform those tests using R. This article explains the process of performing some hypothesis tests and constructing confidence intervals with a lot of examples.
Predictive models are advanced features in statistics. Lots of other machine learning models are built on some simple statistical models. The most basic predictive model is the linear regression model. The next two articles explain the theories, formulas, R functions, and the process of inference from the simple and multiple linear regression models in details with some real projects:
- Detailed Explanation of Simple Linear Regression, Assessment and, Inference with ANOVA
- Detailed Guide to Multiple Linear Regression Model, Assessment, and Inference in R
Logistic regression is a little advanced predictive model. It is based on the simple straight-line formula like linear regression. But there is a bit of modification to it. It is used to predict the categorical variables. Although there are so many more advanced predictive models are developed today, logistic regression is still one of the most widely used and effective models. Here is a link that explains the logistic regression in details with a project:
ANOVA and ANCOVA
It is a process to determine the difference in mean. If you are not familiar with the concepts of inferential statistics, you may think we can simply take the means of different samples and see the difference in means. But when we do not have the access to the whole population and we only afford to work with a sample, just taking a difference of individual mean is not enough. ANOVA (Analysis of Variance) and ANCOVA(Analysis of Covariance) are the processes where variance plays a vital role to find out the differences. Here is the detailed tutorial that explains these concepts with a project:
If you came here to this article to learn and are trying to learn using these articles, you should use your own data and practice as well. That’s the only way to learn. It may take some time to grasp these ideas. But if you are an aspiring data scientist, you cannot avoid inferential statistics.
- Text Files Processing, Cleaning, and Classification of Documents in R
- A Complete Beginners Guide to Regular Expressions in R
- A Complete Beginners Guide to Data Visualization in ggplot2
A Complete Free Course on Inferential Statistics for Data Scientists in R was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.