CSE 450 - Machine Learning & Data Mining

Module 06 — Content Creation, Project

Overview

After a few more meetings, your team has been assigned to address the following issues asked by the stakeholders:

Thomas, COO of HackPressIO

So the main thing we need at this point is a proof of concept model that shows we could, with enough work, generate full texts in the style and voice of a particular author.

Monika, Senior Developer

I agree. The original team used Jane Austen as their training corpus, but you could use any author's work you find at Project Gutenberg. Just be sure to clean up the data appropriately.

Thomas, COO of HackPressIO

I think that in order for me to feel good about whatever pipeline is being developed, I'd want to see works in the style of something other than Jane Austen. In fact, I'd really like to see the style of more than one author, preferably at least two or three.

Johnny, the data science intern

Which means a separate network trained on each author's works...

It might be a good idea to define a network architecture used for all authors, and then save the trained model for each author, so you can load a given author's profile into the network whenever you wanted...but we'll leave the specifics up to you.

Thomas, COO of HackPressIO

Finally, you don't need to use this for all of your styles, but once you have one up and running, could you start a story with the following prompt:

"The world seemed like such a peaceful place until the magic tree was discovered in London."

I'd like to see how it compares to some others that were generated from the same prompt. You can submit that one as a separate document alongside your traditional executive summary that outlines your approach and your results in the various styles.

More Tips from Johnny

keras vs tf.keras

Don't forget the warning from the last module, about how Keras used to be a standalone library, but as of September 2019, it is part of Google's TensorFlow 2.0 library.

Keep that in mind if you're looking at any tutorial that was written prior to that date. Most of the API and functions will be the same, but your import statements will likely be different.

For more information, see this article on the change.

Starter Code

This Colab notebook contains the starter code left by the previous team.

There may be better approaches than what that notebook is doing, but it will at least get you started.

Saving Models

There are multiple ways to save a Keras model.

Johnny, the Data Science Intern, catches you after work:

Hey, I know you're probably busy, so I put a bunch of comments and explanations in the code left behind by the previous team, so make sure you read through those. Also, make sure you review the RNN tutorials from the reading assignment.


  1. COO photo by Jonas Kakaroto on Unsplash 

  2. Senior Developer photo by Mimi Thian on Unsplash 

  3. Data Science Intern photo by Fábio Lucas on Unsplash