Photo by Samrat Khadka on Unsplash
Overview
Estimated Reading Time
Plan on around 90 - 120 minutes for this preparation reading, which consists of a mix of textbook and online reading.
The objective of this module is to provide a real-world scenario in which you can practice the following data science / machine learning skills:
- Gradient Boosted Trees and the XGBoost Library
- Evaluating how well a model carries out regression
Preparation Reading
Model Ensembles
First, read this section from your textbook:
- Read Section 8.4.5 of your text (Performance Measures: Continuous Targets)
- Read section 4.4.5 of your text (Model Ensembles)
Gradient Boosted Trees
Four videos are listed below.
The first video explains the concepts of gradient boosted trees within the context of regression tasks. The second explains the mathematics behind those concepts.
The third video explains the concepts of gradient boosted trees within the context of classification tasks. The fourth explains the mathematics behind those concepts.
It's not essential that you master the mathematics, though you should try your best to follow along as they do a really good job of explaining what some of the stickier bits of notation represent.
(Don't let the corny music at the start dissuade you, they're really good videos)
Optional Video
The following video completes the series. It's very good and does an excellent job walking through the derivation of the math used in the classification example, but it is deeper than what is required for this course.
You might consider watching it with the intent to understand the approach at a high level rather than worrying about the details of every expression.
XGBoost
You can find documentation on how to use xgboost to be useful, particularly the sections on the sklearn wrapper:
Johnny, the Data Science Intern, drops by your hotel room around midnight:
Okay, just one last thing, if you need any more help at all, I put together this collection of Google Colab notebooks that might be useful.