Get to grips with building robust XGBoost models using Python and scikit-learn for deployment
- Get up and running with machine learning and understand how to boost models with XGBoost in no time
- Build real-world machine learning pipelines and fine-tune hyperparameters to achieve optimal results
- Discover tips and tricks and gain innovative insights from XGBoost Kaggle winners
XGBoost is an industry-proven, open source software library that provides a gradient boosting framework for scaling billions of data points quickly and efficiently.
The book starts with an introduction to machine learning and XGBoost before gradually moving on to gradient boosting. You’ll cover decision trees in detail and analyze bagging in the machine learning context. You’ll then learn how to build gradient boosting models from scratch and extend gradient boosting to big data to recognize their limitations. The book also shows you how to implement fast and accurate machine learning models using XGBoost and scikit-learn and takes you through advanced XGBoost techniques by focusing on speed enhancements, deriving parameters mathematically, and building robust models. With the help of detailed case studies, you’ll practice building and fine-tuning regressors and classifiers and become familiar with new tools such as feature importance and the confusion matrix. Finally, you’ll explore alternative base learners, learn invaluable Kaggle tricks such as building non-correlated ensembles and stacking, and prepare XGBoost models for industry deployment with unique transformers and pipelines.
By the end of the book, you’ll be able to build high performing machine learning models using XGBoost with minimal errors and maximum speed.
What you will learn
- Build machine learning bagging and boosting models
- Develop XGBoost regressors and classifiers with impressive accuracy and speed
- Find out how to analyze variance and bias in machine learning
- Compare XGBoost’s results to decision trees, random forests, and gradient boosting
- Visualize tree-based models and use machine learning to determine the most important features of a dataset
- Implement robust XGBoost models ready for industry deployment
- Build non-correlated ensembles and stack XGBoost models to increase accuracy
Who This Book Is For
This book is for data science professionals and enthusiasts, data analysts, and developers who want to build fast and accurate machine learning models that scale with big data. Proficiency in Python and a basic understanding of linear algebra will help you to get the most out of this book.