Overview
In this reading you'll learn about the k-Nearest Neighbors algorithm, as well as how to evaluate machine learning models.
Estimated Reading Time
Plan on around 60 to 90 minutes for this preparation reading, which consists of textbook reading.
Reading
Reading the Text
As you read the algorithm chapters in the textbook, you may find it useful to start with the chapter summary at the end of the chapter.
This summary provides an overview of the key concepts presented in the chapter, and shows how each is connected together.
Complete the following preparation reading:
-
Read Chapter 5 until section 5.4.2 in the textbook, which introduces Similarity-based learning and the nearest neighbors family of algorithms.
-
Read sections 5.4.3 and 5.4.4 in the textbook, which discusses some things to consider when using a nearest neighbor algorithm.
-
Read Chapter 8 until the beginning of section 8.4.3 of your textbook, which describes ways to verify how well a machine learning model works.
Extra Help
Below you'll find some optional videos and other resources that help supplement the reading.
You should absolutely still do the reading above. One technique would be to read the text, paying particular attention to new concepts (usually written in bold), then research those concepts using videos or other articles until you're confident you understand them. Afterwards, circle back to the text to pick up extra details you might have missed the first time.
Learning Complex Technical Information
Reading technical information can be difficult and is an acquired skill that you absolutely should develop if you're planning to work in data science. New research papers and algorithms are released constantly in this field that require you to parse through information and formulas.
This helps you to not only understand how the algorithm works, but which types of problems the algorithm would and would not not be suited for.
However, sometimes it's nice to have a different perspective. Some people learn better visually, through videos, interactively, or by example.
In some cases, a superficial understanding of an algorithm and its parameters may be good enough for what you need to do. But you'll always benefit from a deeper understanding of how the tools and algorithms you're using actually work, and the reasons they behave better in some situations than others.
Model Evaluation
-
About Train, Validation and Test Sets in Machine Learning is an article that explains why we split the dataset up to evaluate our machine learning models.
-
Accuracy, Precision, Recall or F1? explains a bit about each of these performance metrics, and when it is best to use each one.