CSE 450 - Machine Learning

Data Sources

Data Sources

This course uses datasets in the public domain or under a compatible license. Information for each dataset may be found below:

Netflix Movies and TV Shows

A dataset listing TV Shows and Movies available on Netflix, along with some metadata about those offerings.

Cereal Nutrition

A dataset of nutritional information for major US cereal brands.

Titanic Passenger List

Information about Titanic Passengers

US Honey Production

This dataset is inspired by Honey Production in the USA, extended to the period 1998-2017. Plus, I joined data from USGS's Pesticide National Synthesis Project, allowing evaluation of the statistical connections between Honey Production and the use of Neonicotinoid (neonic) pesticides.

Congressional Voting Record

Voting information for the 1984 US Congress

HR Data

Modified version of Synthetic Human Resources Dataset Dr. Carla Patalano, “Human Resources Data Set.” Kaggle, doi: 10.34740/KAGGLE/DSV/774340.

Banking Data

Dataset based on Moro S, Cortez P, Rita P. A data-driven approach to predict the success of bank telemarketing. Decision Support Systems. 2014;62:22-31. doi:10.1016/j.dss.2014.03.001. Further augmented here

Spotify Data

Spotify song information.

Iris Data

Iris Flower Dataset

Housing Data

House Sales in King County, USA

Auto MPG Data

Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.

Biking Data

Fanaee-T, Hadi, and Gama, Joao, 'Event labeling combining ensemble detectors and background knowledge', Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg, [Web Link].

Laboratory of Artificial Intelligence and Decision Support (LIAAD), University of Porto INESC Porto, Campus da FEUP Rua Dr. Roberto Frias, 378 4200 - 465 Porto, Portugal

Original Source: http://capitalbikeshare.com/system-data Weather Information: http://www.freemeteo.com Holiday Schedule: http://dchr.dc.gov/page/holiday-schedule

Jane Austen Texts

Project Gutenberg https://www.gutenberg.org

German Traffic Signs

J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. The German Traffic Sign Recognition Benchmark: A multi-class classification competition. In Proceedings of the IEEE International Joint Conference on Neural Networks, pages 1453–1460. 2011.