Programming Resources
The assignments in this course require a fair amount of Python coding, as well as the use of a few popular Python-based data science tools.
Agency Based Education
It's expected that you'll either know some Python coding coming in to this course, or be able to quickly come up to speed largely on your own. While you don't need to be an expert, or even highly skilled, the more you know about Python coding, the better off you'll be.
In addition, we'll be using a variety of machine learning and data science libraries and frameworks that you may or may not have experience with.
We don't spend much time in this class explicitly learning Python or these tools, for a few reasons:
-
In industry, you'll often be asked to complete a task or project that requires quickly coming up to speed on a new technology or software library that you've never used before.
-
By the time you start your first job, many of the tools and libraries will be different anyway, so we try to focus on teaching principles and helping you "learn how to learn" on your own so you can keep up with this rapidly changing field.
-
As a 400-level student, we expect that at this point you have enough diligence and experience reading technical documentation and tutorials that you can handle this largely on your own.
However, we have put together this guide, as well as a separate guide with tips for reading technical documentation that you may find useful.
Python 2 vs 3
Python went through a major revision a few years ago. In this course, we use Python 3. You may find tutorials for (or already know) Python 2.
For the purposes of this course, there really aren't that many differences you have to think about.
You can see a good summary of the most important differences here.
Python
Google tells me there are 387 Million hits for Python Tutorial
, so obviously you have a lot of options, from the official Python tutorial to this 11 hour video course.
Students often wonder which tutorial is the best for a given subject. Unfortunately, there is no such thing.
Some students learn better from books, others from websites, and some prefer videos. Some students want interactive tutorials, others feel like they can only learn if they take a class.
Here are some Python tutorials that I think are good for students:
- Codecademy's Python Course
- Google's Python Course
- Learn Python the Hard Way
- How to Think Like a Computer Scientist
Pandas
Pandas is a data science library that makes it easy to do common data manipulations and analysis. We'll be using it quite a bit in this course.
There aren't quite as many Pandas tutorials as there are Python tutorials, (only 16 Million hits for Pandas Tutorial
).
Here are some Pandas tutorials that I think are good for students:
- Pandas Official Getting Started Tutorials
- 10 Minutes to Pandas
- Google's Colab-based, Ultraquick Pandas Tutorial
- Kaggle's Pandas Tutorial
NumPy
NumPy is a numeric computation library designed to allow Python to carry out super-optimized matrix algebra operations. It'll be rare in this course that we need to use NumPy directly, but many libraries (including Pandas and scikit-learn) are built on top of NumPy, so knowing something about it can sometimes be handy.
Here are some NumPy tutorials that I think are good for students:
- NumPy Official Getting Started Guide
- NumPy for Absolute Beginners
- Google's Colab-based, Ultraquick NumPy Tutorial
SciKit Learn
SciKit Learn is a machine learning library that we'll be using quite a bit in this course. It makes heavy use of both Pandas and NumPy.
My advice for this library is to get an overview of how it works, and especially learn about the pipeline functions. Then as you need to use each algorithm, read the details about that algorithm in the User Guide and API manual.
Altair
There are lots of visualization libraries out there, and pandas has some visualization functions built into it, but we recommend you become familiar with Altair: