Module 03 - Overview
This week you'll learn about Gradient Boosted Trees and use the XGBoost library to solve regression tasks using gradient boosted trees.
Subject to Change
Keep in mind that your instructor may deviate somewhat from the following guide, and they have final say on assignment requirements, delivery methods, and due dates. So be sure to pay attention to both in-class and Canvas announcements.
Module 03 Assignments
For your convenience, here are links to the module 03 readings and assignments:
Readings
Data
- Housing Dataset
- Housing Data Dictionary
- Housing Holdout Dataset
- Housing Holdout MINI Dataset
- Google Colab Notebook
Using XGBoost
If you've read through the official documentation and tutorials about XGBoost on the project page and still aren't sure how to use it, this colab notebook might help:
Holdout Mini Dataset
This module has a mini holdout dataset. You can test your model against this mini holdout dataset as many times as you'd like. It is here to
- Verify that your CSV is in the correct format
- Verify that your model is making good predictions
- Give you an idea of what your grade might be on the final holdout set
Once your team is confident your model has been adequately trained, load the data from the mini holdout dataset and make predictions on it. Save the predictions as a CSV.
Open and run this colab (follow the instructions at the top of the notebook)
Don't Forget:
- Perform the same transformations on the dataset. (MinMaxScaling, add/remove features, etc.)
- Do NOT remove any rows from the holdout dataset
- Do NOT sort or shuffle the holdout dataset
- You can test the mini holdout as many times as you'd like, but be careful not to overfit to the mini holdout set. Ideally, you should get similar results to the test dataset you created.