Questions
You're at dinner with the President of the bank, VP of Marketing, and the Senior Data Scientist. They want to make sure you have the data required to answer the questions they're most interested in.
Be prepared to answer the following questions:
Data Science Methods
Miguel Ferreira, Bank President asks:
The core task we're interested in is identifying those customers most likely to subscribe to a term deposit.
A term deposit is a fixed-term investment that includes the deposit of money into an account at a financial institution. In this case, our financial institution.
I don't know a lot about data science, but I've been trying to get up to speed. Do you think a supervised or unsupervised approach would work best for this situation?
Train Test Split
Beatriz, Senior Data Scientist asks:
Miguel, that is a great question.
While we are asking detailed questions, the dataset has approximately 37,000 records. How much of that data will you use to train your model?
Based on your initial analysis of the data, your team feels:
- A simple 80/20 split will provide us with enough to accurately train and test our model.
- A 50/50 split so that we have the same amount of training data as testing.
- We will pull out 1,000 records for our test dataset and use the other 36,000 for training. This gives our model more to train on and will produce better results.
- We will use all 37,000 for training and use cross-validation to evaluate the model.
Additional Insights
Francisco, VP of Marketing asks:
Aside from the core marketing question Miguel mentioned, I'm wondering if there are other insights we could gain from our data.
I can look at the data and tell that some days of the week or some months produce better results than others.
I'm wondering if it's possible for us to see if those results are true for all customers, or if some types of customers respond better on certain days than others?
Data Privacy Laws
Beatriz, Senior Data Scientist asks:
Since we're operating in the European Union, we're subject to GDPR compliance requirements.
What do you think we might need to do for this project in order to be compliant with GDPR regulations?
Based on your initial analysis of the data, your team feels:
- This is historic data, so we should be just fine.
- This is anonymous data, so we should be just fine.
- The GDPR doesn't apply in this situation, since we're just building a model, not selling data.
- In order to use this data under GDPR, we'll need to get consent from the customers in the dataset.