Data Sources
This course uses datasets in the public domain or under a compatible license. Information for each dataset may be found below:
Netflix Movies and TV Shows
A dataset listing TV Shows and Movies available on Netflix, along with some metadata about those offerings.
- Original Source: Kaggle
- License: Creative Commons 0 (Public Domain)
- Direct Download
Cereal Nutrition
A dataset of nutritional information for major US cereal brands.
- Original Source: Kaggle
- License: Creative Commons BY-SA 3.0 (Attribution-ShareAlike)
- Direct Download
Titanic Passenger List
Information about Titanic Passengers
- Original Source: Kaggle
- License: Creative Commons 0 (Public Domain)
- Direct Download
US Honey Production
This dataset is inspired by Honey Production in the USA, extended to the period 1998-2017. Plus, I joined data from USGS's Pesticide National Synthesis Project, allowing evaluation of the statistical connections between Honey Production and the use of Neonicotinoid (neonic) pesticides.
- Original Source: Kaggle
- License: Creative Commons 0 (Public Domain)
- Direct Download
Congressional Voting Record
Voting information for the 1984 US Congress
- Original Source: UCI
- License: Citation Requested
- Direct Download
HR Data
Modified version of Synthetic Human Resources Dataset Dr. Carla Patalano, “Human Resources Data Set.” Kaggle, doi: 10.34740/KAGGLE/DSV/774340.
- Original Source: Kaggle
- License: Creative Commons By-SA 4.0
- Direct Download
Banking Data
Dataset based on Moro S, Cortez P, Rita P. A data-driven approach to predict the success of bank telemarketing. Decision Support Systems. 2014;62:22-31. doi:10.1016/j.dss.2014.03.001. Further augmented here
- Original Source: Kaggle
- License: Creative Commons BY-NC-SA 4.0
- Direct Download
Spotify Data
Spotify song information.
- Original Source: Kaggle
- License: Community Data License Agreement - Sharing, 1.0
- Direct Download
- Direct Download (by Artist)
- Direct Download (by Artist with genres)
- Direct Download (by Genre)
- Direct Download (by Year)
Iris Data
Iris Flower Dataset
- Original Source: UCI
- License: Citation Requested
- Direct Download
Housing Data
House Sales in King County, USA
- Original Source: Kaggle
- License: Creative Commons 0 (Public Domain)
- Direct Download
Auto MPG Data
Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
- Original Source: UCI
- License: Citation Requested
- Direct Download
Biking Data
Fanaee-T, Hadi, and Gama, Joao, 'Event labeling combining ensemble detectors and background knowledge', Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg, [Web Link].
Laboratory of Artificial Intelligence and Decision Support (LIAAD), University of Porto INESC Porto, Campus da FEUP Rua Dr. Roberto Frias, 378 4200 - 465 Porto, Portugal
Original Source: http://capitalbikeshare.com/system-data Weather Information: http://www.freemeteo.com Holiday Schedule: http://dchr.dc.gov/page/holiday-schedule
- Original Source: UCI
- License: Citation Requested
- Direct Download
Jane Austen Texts
Project Gutenberg https://www.gutenberg.org
- Original Source Project Gutenberg
- License: Public Domain
- Direct Download
German Traffic Signs
J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. The German Traffic Sign Recognition Benchmark: A multi-class classification competition. In Proceedings of the IEEE International Joint Conference on Neural Networks, pages 1453–1460. 2011.
- Original Source INI
- License: Citation Requested
- Direct Download