CSE 212: Programming with Data Structures

W03 Team Activity: Sets and Maps

For this activity, you need to meet with your group for a synchronous meeting using a video sharing tool, such as Microsoft Teams.

This activity is designed to take about one hour.

Instructions

Use the following as an agenda for your team meeting. The person that is assigned to be the Lead Student for this gathering should help guide the group through these steps.

Before the meeting: Verify the time, location, and lead student

This could be as simple as posting a message to your Microsoft Teams channel that says something like, "Hi guys, are we still planning to meet tomorrow at 7pm Mountain Time? Let's use the Microsoft Teams video feature again." Or, if someone else has already posted a message like this, it could be as simple as "liking" their message.

Make sure to identify who will be the lead student for this week. For example, "Emily, are you still planning to be the lead student for this week?"

Begin with Prayer

Review the Preparation Learning Activities (5 minutes)

Briefly discuss the following questions: (The Lead Student should ask each question.)

  1. What is the difference between a set and a map?
  2. What is one use of a Dictionary (besides the class/object examples shown in the learning activity)?
  3. What is one of the most important things to remember when answering technical job interview questions?

These problems should be completed and discussed as a group. Answers are provided below.

Solve the Problem: Unique Letters (10 minutes)

  1. Open the course repository and browse to the week03/teach project in your code editor, and open the file UniqueLetters.cs
  2. This code will determine if a string has all unique letters. For example, "abcdefg" has unique letters but "abccdefg" does not because the "c" is repeated more than once. The code is implemented using loops which results in an O(n2) performance.
  3. Answer the question together: How can unique letter method be written with O(n) performance using a set? As you answer this question, talk about the behavior, purpose and/or performance of a set to help you arrive at an answer.
  4. After you think you know how to solve the problem (take no more than five minutes), compare your approach to the solution below.
  5. Solution: Unique Letters Approach

    Question: How can the unique letter function be written with better performance using a set data structure?

    Answer: A set has the property that all items must be unique. If the letters in the input text do not form a set, then the function should return False. To determine if the letters in the word form a set, I can loop through each of the letters in the input text and add them one at a time to a set. Looping through each of the letters will be O(n). Before I add a letter to the set, I will check to see if the letter is already in the set. This check will be O(1). I can exit the loop as soon as I find a duplicate letter.

    Alternative Answer: A set has a property that all items must be unique. If we copied letters from a string into a set (an O(n) operation), then one of two things would happen. If the letters were all unique, then the size of the set would be equal to the size of the string. However, if the letters were not all unique, the then size of the set would be smaller than the size of the string. We could create the set of unique letters from the string and then compare the sizes to get the answer in O(n) time.

  6. When you have a good idea for a solution, uncomment the lines in Program.cs to include running the UniqueLettersSolution and take a look at the UniqueLettersSolution.cs file and discuss it together.

Solve the Problem: Display Sums (25 minutes)

  1. Open the course repository and browse to the week03/teach project in your code editor, and open the file DisplaySums.cs
  2. Your goal is to write a program that will find and display all pairs of numbers in a list that sum up to 10. No duplicates should be displayed. This could be done in O(n2) using a loop within a loop. However, using a set, this can be done in O(n) time. You should assume that the list of numbers provided has no duplicates.
  3. Answer this question together: How can you solve the problem using a set data structure? As you answer this question, talk about the behavior, purpose, and/or performance of a set to help you arrive as the answer.
  4. After you think you know how to solve the problem (take no more than five minutes), compare your solution to the one below.
  5. Solution: Display Sums Approach

    Question: How can find pairs of numbers in a list (assume no duplicates) that sum to a value of 10 and display them in O(n) time using a set data structure?

    Answer: We could use two embedded for loops looking for pairs but this would be O(n2). A set gives us O(1) for lookup. If we knew the value we were looking for and if those values were put in a set, then we could get O(1). We could add all of the numbers into a set. Then, we could loop through the original list again (call each number x) and then lookup quickly in the set for 10-x. However, we might get duplicate answers (such as 3+7 and 7+3).

    We could avoid the duplication and the two sequential loops. Suppose my list was {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. While building the set from the list, I could actively check to see if 10-x is in my set yet. If I was currently looking at the '2', then 10-2 (or 8) would not be in the set yet. However, eventually we will get to the '8' and then '2' will be in set by that time. This is O(n).

  6. Based on your understanding, implement the DisplaySumPairs function.
  7. When you are done, uncomment the lines in Program.cs to include running the DisplaySumsSolution and take a look at the DisplaySumsSolution.cs file and discuss it together.
Think Out Loud

Notice that in the solution for the answer above, the provided response walks through a few different ideas:

When you talk through ideas like this, your interviewer gets a chance to see the way you approach a problem and consider alternatives. Notice how the solution here starts with the simplest approach, then considers something a little more robust, but still not quite adequate before arriving a a complete solution.

Solve the Problem: Using a Map to Summarize Data (20 minutes)

  1. Open the course repository and browse to the week03/teach project in your code editor, and look at the basketball.csv file. This file contains statistics for professional NBA basketball players. Notice that Column 0 contains the Player ID, column 1 contains the year, and column 8 contains the points they scored that year.
  2. Your goal is to write a program that will determine the total number of points scored by a player over all the years that they played, and then summarize this in a table showing the top 10 players with the highest total points scored.
  3. Open the file Basketball.cs and review the starter code. This starter code opens the csv file, reads through it line by line and extracts the playerId and points from that year.
  4. Add logic to this program to use a Map to determine the total points for each player.
  5. Once you have built the map you can convert it to an array, sort it, and then display the top 10 players with the highest point total.
  6. When you are done, compare your approach to the one in the BasketballSolution.cs file and discuss it together.

Conclude

At the end of your meeting:

  1. Determine who will be the lead student for the next meeting.

Submission

When you are finished:

Other Links: