Ensemble learning

Understand Integrated Learning in one article (explain bagging, boosting and their four differences)

In machine learning, we talked about many different algorithms. Those algorithms are all solo heroes. And integrated learning is to form a team of these heroes. To achieve the effect of “three stinking cobblers are the best”.

This paper introduces two main ideas of integrated learning: bagging and boosting.

What is integrated learning?

Integrated learning belongs to machine learning. It is a kind of “training idea”, not a specific method or algorithm.

In real life, we all know that “there are many people with great power” and “three bad skinners are the best”. The core idea of integrated learning is “more people, more power”. Instead of creating new algorithms, it combines existing algorithms to get better results.

Understand Integrated Learning in one article (explain bagging, boosting and their four differences)

Integrated learning will select some simple basic models for assembly. There are two main ways to assemble these basic models:

  1. Bagging (short for bootstrap aggregating, also known as “bagging”)
  2. boosting

Bagging

Understand Integrated Learning in one article (explain bagging, boosting and their four differences)

The core idea of bagging is democracy.

Bagging’s idea is that all basic models are treated in the same way, and each basic model has only one vote. And then use democratic voting to get the final result.

In most cases,The variance of the result obtained by bagging is smaller

Understand Integrated Learning in one article (explain bagging, boosting and their four differences)

Specific process:

  1. The training set is extracted from the original sample set. Bootstrapping method is used to extract N training samples from the original sample set in each round (in the training set, some samples may be extracted multiple times, while some samples may not be extracted at one time). A total of K rounds of extraction were carried out and K training sets were obtained. (k training sets are independent of each other)
  2. One model is obtained by using one training set at a time, and a total of K models are obtained by using K training sets. (Note: there is no specific classification algorithm or regression method here. We can use different classification or regression methods according to specific problems, such as decision tree, perceptron, etc.)
  3. For the classification problem: the K models obtained in the previous step are voted to get the classification results; for the regression problem, the average value of the above models is calculated as the final result. (all models are of the same importance)

Give an example:

Among the bagging methods, the most widely known is random forest: bagging + decision tree = random forest

Understanding decision tree (3 steps + 3 typical algorithms + 10 advantages and disadvantages)

Understanding random forest in one paper (4 steps + 4 ways of evaluation + 10 advantages and disadvantages)

Boosting

Understand Integrated Learning in one article (explain bagging, boosting and their four differences)

The core idea of boosting is to select the elites.

The essential difference between boosting and bagging is that they don’t treat the basic model in the same way, but choose the “elite” through constant tests and screening, and then give more voting rights to the elite, and give less voting rights to the underperforming basic model, and then synthesize the votes of all people to get the final result.

In most cases,The bias of the results obtained by boosting is smaller

Understand Integrated Learning in one article (explain bagging, boosting and their four differences)

Specific process:

  1. The basic model is combined linearly by the addition model.
  2. Each round of training improves the weight of the basic model with low error rate and reduces the weight of the model with high error rate.
  3. Change the weight or probability distribution of the training data in each round, improve the weight of the samples that were wrongly divided by the weak classifier in the previous round, and reduce the weight of the samples in the previous round, so that the classifier has a better effect on the data wrongly divided.

Give an example:

Among the boosting methods, AdaBoost and gradient boosting are relatively popular.

Understanding AdaBoost and its advantages and disadvantages

Four differences between bagging and boosting

Understand Integrated Learning in one article (explain bagging, boosting and their four differences)

Sample selection:

Bagging: the training set is selected from the original set, and each round of training set selected from the original set is independent.

Boosting: the training set of each round is unchanged, but the weight of each sample in the training set changes in the classifier. The weight is adjusted according to the classification results of the previous round.

Sample weight:

Bagging: using uniform sampling, each sample has equal weight

Boosting: adjust the weight of the sample according to the error rate. The higher the error rate, the greater the weight.

Prediction function:

Bagging: all prediction functions have equal weight.

Boosting: each weak classifier has a corresponding weight, and the classifier with small classification error will have a greater weight.

Parallel computing:

Bagging: each prediction function can be generated in parallel

Boosting: each prediction function can only be generated in sequence, because the latter model parameter needs the result of the previous model.

The difference part is from the concept and difference of bagging and boosting