CSE 450 - Machine Learning

Module 02 — Decision Trees

Clustering Photo by veeterzy on Unsplash

Overview

Estimated Reading Time

Plan on around 90 - 120 minutes for this preparation reading, which consists of textbook reading.

There are also some optional online materials you may want to review.

The objective of this module is to provide a real-world scenario in which you can practice the following data science / machine learning skills:

Preparation Reading

Reading the Text

As you read the chapters in the textbook, you may find it useful to start with the chapter summary at the end of the chapter.

This summary provides an overview of the key concepts presented in the chapter, and shows how each is connected together.

Sensitivity and Specificity

In section 8.4.2.1 (Confusion Matrix-based Performance Measures), you'll see the terms "True Positive Rate" and "True Negative Rate" (among others).

It's more common to refer to "True Positive Rate" as Sensitivity.

It is also more common to refer to "True Negative Rate" as Specificity.

The text mentions this in a footnote, but they are such common terms, we want to make sure you don't miss this definition.

Complete the following preparation reading:

Decision Trees

You can find documentation on how to use decision trees with sci-kit learn on these pages:

Extra Help

Below you'll find some optional videos and other resources that help supplement the reading.

You should absolutely still do the reading above. One technique would be to read the text, paying particular attention to new concepts (usually written in bold), then research those concepts using videos or other articles until you're confident you understand them. Afterwards, circle back to the text to pick up extra details you might have missed the first time.

Learning Complex Technical Information

Reading technical information can be difficult and is an acquired skill that you absolutely should develop if you're planning to work in data science. New research papers and algorithms are released constantly in this field that require you to parse through information and formulas.

This helps you to not only understand how the algorithm works, but which types of problems the algorithm would and would not not be suited for.

However, sometimes it's nice to have a different perspective. Some people learn better visually, through videos, interactively, or by example.

In some cases, a superficial understanding of an algorithm and its parameters may be good enough for what you need to do. But you'll always benefit from a deeper understanding of how the tools and algorithms you're using actually work, and the reasons they behave better in some situations than others.

Decision Trees

Shannon's Entropy Model

Entropy Formula Image from The Intuition behind Shannon’s Entropy

Model Evaluation

Johnny, the Data Science Intern, drops by your hotel room around midnight:

Okay, just one last thing, if you need any more help at all, I put together this collection of Google Colab notebooks that might be useful.


  1. Data Science Intern photo by Fábio Lucas on Unsplash