CSE 450 - Machine Learning & Data Mining

Module 03 — Gradient Boosted Trees

Stacked Rocks Photo by Samrat Khadka on Unsplash

Overview

Estimated Reading Time

Plan on around 90 - 120 minutes for this preparation reading, which consists of a mix of textbook and online reading.

The objective of this module is to provide a real-world scenario in which you can practice the following data science / machine learning skills:

Preparation Reading

Model Ensembles

First, read this section from your textbook:

Gradient Boosted Trees

Four videos are listed below.

The first video explains the concepts of gradient boosted trees within the context of regression tasks. The second explains the mathematics behind those concepts.

The third video explains the concepts of gradient boosted trees within the context of classification tasks. The fourth explains the mathematics behind those concepts.

It's not essential that you master the mathematics, though you should try your best to follow along as they do a really good job of explaining what some of the stickier bits of notation represent.

(Don't let the corny music at the start dissuade you, they're really good videos)

Optional Video

The following video completes the series. It's very good and does an excellent job walking through the derivation of the math used in the classification example, but it is deeper than what is required for this course.

You might consider watching it with the intent to understand the approach at a high level rather than worrying about the details of every expression.

XGBoost

You can find documentation on how to use xgboost to be useful, particularly the sections on the sklearn wrapper:

Johnny, the Data Science Intern, drops by your hotel room around midnight:

Okay, just one last thing, if you need any more help at all, I put together this collection of Google Colab notebooks that might be useful.


  1. Data Science Intern photo by Fábio Lucas on Unsplash