CSE 450 - Machine Learning

Module 02: Overview

Module 02 - Overview

This week you'll learn about supervised learning, decision trees, and more about model evaluation.

Subject to Change

Keep in mind that your instructor may deviate somewhat from the following guide, and they have final say on assignment requirements, delivery methods, and due dates. So be sure to pay attention to both in-class and Canvas announcements.

Module 02 Assignments

For your convenience, here are links to the module 02 readings and assignments:

Readings

Data

Use a decision tree

If you're really struggling with how to make a decision tree, you should try reading through the documentation a couple more times.

If that still doesn't help, this notebook can give you some more guidance:

Open In Colab

Use a random forest

If you want to try different machine learning algorithms on this dataset, you may want to try a random forest.

This notebook can give you an example of how to build a random forest:

Open In Colab

Holdout Dataset

Most of the modules contain a "holdout dataset" which is used by the instructor to see how well your model performs. This dataset has the same fields as the training set except we have removed the value you will be predicting. Your model will not have seen this information before, and the results will indicate how well the model has "learned" during training.

Once your team is confident your model has been adequately trained, load the data from the holdout dataset and make predictions on it. You are required to make predictions for ALL rows in the holdout dataset.

Don't Forget:

Holdout Mini Dataset

This module has a mini holdout dataset. You can test your model against this mini holdout dataset as many times as you'd like. It is here to

Once your team is confident your model has been adequately trained, load the data from the mini holdout dataset and make predictions on it. Save the predictions as a CSV.

Open and run this colab (follow the instructions at the top of the notebook)

Don't Forget:

Sample code to help with the mini holdout

test = pd.read_csv("https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/bank_holdout_test_mini.csv")

# Do same transformations as on the training set

predictions = clf.predict(test)

# Convert the predictions to a dataframe and label the column 'y'
my_predictions = pd.DataFrame(predictions, columns = ['y'])

# Replace PUTTEAMNUMBERHERE with your team
my_predictions.to_csv("teamPUTTEAMNUMBERHERE-module2-predictions.csv",index=False)

Templates

Hints and Helps