CSE 370: Software Engineering Principles

W03 Case Study Reading: Designing Software with Microservices

Instructions

Prepare for your team meeting and your individual analysis by thoughtfully reading the following case study.

Submission

After completing the reading, return to Canvas to submit a quiz about the basic case study facts.

Then, in separate assignments, you will discuss this case study with your team and complete your analysis of it.

Case Study: Designing Software with Microservices

The specific characters, companies, and projects of this case study are fictional, but they are based on actual circumstances that occurred at a company that was building out it's multinational web presence.

Aiden O'Brien

Aiden O'Brien looked out of the office window and admired the view of the pedestrians below as they crossed the River Corrib on the Wolfe Tone Bridge. Although lunch hour in downtown Galway, Ireland was in full swing, there weren't as many people out and about as there normally were. Maybe a walk would do him good since he could avoid the normal press of people in the downtown area. Anything would be better than trying to make the decision he'd been agonizing over. Life had been pretty good to him so far—but lately the pressure has been ratcheting up. Aiden sat back in his chair and started to think about how it all came about.

More than 10 years ago, Aiden joined the company MyHomeStuff.com as one of its first Irish employees. The company was founded in the United States, but labor costs in Ireland were favorable, so the company sent their Chief Technical Officer out to Ireland to see about hiring a team in Galway. Aiden was among the first to sign up, and the team had grown steadily since then. There were now three teams in Galway, and each of them had their own team lead who reported directly to Aiden, who was the Director of Software Engineering for the Irish office. Initially, the teams in Galway were given side projects with a limited impact, mostly because the US based company was unsure what they were going to get. With several successful projects behind them now, the company was sending more and more meaningful work their way.

Three months ago, the biggest project yet was sent to the Galway office—the launch of MyRugsStuff.com. This was meant to be a sister web site to MyHomeStuff.com but with a narrow focus on the lucrative rugs category, and an experience tailor-made to customers buying rugs. The hope was to move the rugs business from MyHomeStuff.com to the new site in a seamless way. The company had aggressive goals for growing the rug business that just didn't work within the confines of the old MyHomeStuff.com site. The plan was to have the new site ready to go in just six months.

Despite the tight timeline, the team in Galway had seen this as an opportunity to build an excellent website from the ground up using the latest technology stack. Work on the older MyHomeStuff.com site was more difficult and subject to outdated processes and procedures. This was a chance to do things in a better way. The developers were beyond excited about this opportunity.

For months now, the Chief Technical Officer of MyHomeStuff.com had been preaching the virtues of using the DevOps methodology. He wanted to see teams make frequent small changes that were continuously deployed to production. This would deliver valuable features to the customers more quickly than ever. The company signed up with a large DevOps platform in hopes of moving most of the deployments to this cloud-based system over time. However, this required a huge change in processes and infrastructure for the company, and many teams struggled to adopt this change.

One thing that made adoption difficult was the prevalence of monolithic services. Throughout the years the teams at MyHomeStuff.com had created large, highly scaled services to handle different aspects of the business such as product details, pricing strategy, promotions, search results, and of course cart management and checkout with various payment methods. These large-scale services had a very complex deployment strategy complete with rollback plans, heavily scripted installs, and manual interventions. Moving these large-scale services was going to take a long time, and the pressure to deliver new features was always present. Finding time in the schedule to make the move was an ever-present concern for many teams across the company.

For Aiden and his team, the assignment to build this new business presented an immediate opportunity to move to the DevOps platform and start using microservices instead of monolithic services. The idea behind microservices was that you could create small, narrowly focused services that served only a few well-defined functions, and string them together to accomplish your goals. This would allow the team to do small, frequent deployments, reduce inter-team dependencies, and make it easy to adopt and refresh technology with the latest and greatest offerings.

Because this was a brand-new effort, also known as a greenfield project, the team had to design everything from scratch. Aiden had stressed to the teams that they needed to be sure that everything built had good design documentation. In the past, Aiden had spent signficant time reverse engineering systems that needed to be upgraded, but lacked documentation. If the teams were going to do things well, carefully documented designs needed to be an integral part.

Aiden had intended to do a design review on every system before getting started, but things were moving fast and there were a lot of moving parts. Before long, Aiden was just checking to make sure that a design document of some kind had been added to the wiki before work on each microservice began.

The teams worked quickly and before long, there were more than a dozen microservices deployed on the DevOps platform. Of course, everything was behind a feature flag, and customers could not yet see the MyRugsStuff.com website until the official launch, but the developers and internal stakeholders could see progress on the site in near real time.

Eventually, the security team had reached out to see how things were coming along. With the halfway point nearing, they thought it would be prudent to do a quick security review to see if there were any holes to plug. Aiden had sent them some preliminary information then forgot all about it. There were plenty of other things to keep him busy.

Aiden winced as he remembered the unexpected Slack message he received from Brennan Gallagher, VP of Security in the US.

Brennan: Aiden - I am hearing from the security team that you are using JWT tokens for authorization on the new rugs site. Is that right?

Aiden: Yes - that's right. I understand that's a pretty standard way to handle authorization over the web. One of our senior engineers recommended it so we decided to run with it.

Brennan: Sure - but it has to be done right, and I have some concerns. Can you send me over some documentation so I can take a look how you have implemented it?

Aiden: Yeah - no problem. The team has done a great job of documenting their designs before building anything.

Aiden had gone to the corporate wiki site where all the design documents were stored and had found the design page for authorization details, glanced over it, and then pasted the URL into Slack for Brennan to review (See Exhibit A). It wasn't long before the reply came back.

Brennan: Hey - I'm looking over these designs and I don't think I'm finding what I'm looking for. I'm specifically looking to see if the JWT token can be updated in the browser to accomplish an elevation of privilege attack. Do you know if an end user has access to that token?

Aiden: I cannot answer that off the top of my head. Let me do some digging and I'll get back to you.

Aiden went back to the wiki page and reviewed the diagrams, this time with a more critical eye. There was a box-and-stick diagram that had been drawn on a white board with some handwriting that wasn't easy to read. This diagram showed how the browser might interact with the authentication service and other downstream services but only at a high level. There was also a UML sequence diagram that showed calls between the browser and the backend authentication service, but without much detail about where or how the JWT token was stored. It was clear that in the rush to start development, only the barest of details were provided about how the authorization system really worked.

Frustrated, Aiden reached out over Slack to Jericho Forbes, lead for the team who built the authentication service. Jericho Forbes was also one of the early Irish employees at MyHomeStuff.com. He was very close friends with Aiden, and had previously been instrumental in the success of many of their previous projects.

Aiden got right to the point.

Aiden: Hey, Jericho - I have a question for you. Where does that JWT token get stored in the browser. Does the user have access to it?

Jericho: Hmm. I think that the token is stored in local storage in the browser. It doesn't get cleared out when you leave so you only have to log in once then you're good for about 30 days on that browser without having to log in again.

Aiden: Local storage? Doesn't that mean that the user can make updates to that token if they want to?

Jericho: Well, yeah, but it's base64 encoded so it's not like it's in plain text or anything.

Aiden: Anyone who knows enough to look in local storage probably knows how to decode base64 as well. I think this is going to be a problem. Brennan Gallagher was asking me about an elevation of privilege possibility. What is in the JWT? I don't see those details in the design document.

Jericho: Yeah, well, the usual things like username, full name and all that. I guess we do have a field for if they are a StuffPlus member or not.

Aiden: So anyone can update that flag and see the StuffPlus pricing and promotions?

Jericho: Yeah - I guess that's true.

Aiden: =( OK. Guess I have some bad news to share with Brennan then.

This was indeed bad news. StuffPlus memberships cost about $49 and gave users access to extra savings and special deals. These deals could not be made available to just anybody, otherwise the program would not pay for itself, and the company could lose money. Reluctantly, Aiden reached out to Brennan to share the news. As expected, Brennan insisted that the problem be rectified immediately, and that a new authorization scheme be implemented so that this couldn't happen. Although Aiden had agreed to make the change, he stressed to Brennan that the site was not in production yet, and there was a great deal of feature work on the schedule that would have to be rearranged to accommodate his suggestion. Aiden would have to discuss the matter with Sean Haskins, the VP of Emerging Business who was overseeing the project.

Sean was based in the US but despite the time difference was pretty good at answering the phone at all hours of the day. Sean pointedly ignored any attempts to reach him over Slack, though. A phone call would have to do. Aiden picked up the phone and reached out.

Sean: Aiden! How are things in lovely Galway today? It must be getting close to quitting time for you, eh?

Aiden: Yeah - but I'm not sure if I'll be going home early today or not. We have a bit of a problem here.

Sean: Oh, nothing you can't handle, I'm sure. As long as our site deploys in June like you said it would, there's nothing to worry about.

Aiden: Well, I guess that's the thing. We just talked to the security folks, and they are recommending that we switch our authentication mechanism because the one we are currently using isn't secure enough.

Sean: Oh, well how long will that take?

Aiden: Hard to say, but we've got several services that would have to be retrofitted now so I imagine we're looking at two to four weeks.

Sean: Unacceptable. We cannot afford a schedule slippage that long. Tell me, in what way is it not secure enough?

Aiden: Honestly, I'm not entirely sure - the design documents are a little vague, but it's possible that a savvy user could hack their browser to give themselves StuffPlus membership benefits.

Sean: Oh, is that all? Look - there aren't going to be enough users initially to worry about that. Let's just get this thing out the door and we can plug that hole later.

Aiden: True, but there may be other security problems with this that we haven't considered. I would need to sit down with the developers to see what else might be an issue. I don't know if it's the right thing to just ignore this problem and fix it later.

Sean: Well, I do. The company has been promising the new rugs experience in the second quarter for a while and if we don't deliver it, we'll look bad to our investors. We can't stop what we're doing just to make the security people happy. Just tell me who I need to talk to, and I'll make this go away.

Aiden: You probably should talk to Brennan Gallagher.

Sean: OK, I will. I'll catch up with you later.

The phone call ended abruptly, as it often did with Sean. Looks like a war between the VP of Security and the VP of Emerging Business was now brewing. Maybe going home now was the right solution for today.

The next day, Aiden arrived at work anxiously waiting to see what Slack messages and e-mails might be waiting for him. Before he had even settled in, Jericho burst through the door.

"We need an API Gateway".

"A what?" Aiden had asked.

"An API Gateway, like the one the MyHomeStuff.com site uses. This allows us to reconfigure the security stuff with a single service and forward the calls to the appropriate microservice. We could probably do this in just a few days."

Aiden was familiar with the API gateway used in other parts of the company. The idea was intriguing, but he knew no design change was ever that simple. Aiden put his fingers together and looked across at Jericho.

"Convince me—why do you think an API Gateway is the best way to go here?"

Jericho smiled. "Like I said, we could implement this in just a few days. There is open-source software out there that does all the heavy lifting—all we have to do is create a simple configuration file. The gateway can handle all of the requests and make sure they are authenticated, then pass them through to the right service. That means in the future if we have to do any updates to security we only do it one place. Also, we can add other features like request logging that makes auditing much easier as well. If you want, we can get started on it today!"

"Well, hold on. This is a pretty big change. Let me talk to Alvin Peters who is over the API gateway to get his opinion as well."

"What is there to discuss? The API Gateway is a standard practice everywhere. That's why they included it on our main web site. Honestly, Aiden, I have been thinking about this for a while and wanted to bring it up with you earlier, but you're always so busy. I really think we need to get going on this as soon as possible. We should have done this from the beginning."

Jericho turned to leave, then turned back, "Oh, and just so you know, I spent some time last night documenting the design so you could see what that might look like. It's a child page under the authentication page on the wiki now."

"Well, let me chat with Alvin and I'll let you know what I decide. Thanks, Jericho."

Jericho stepped out of the room and Aiden quickly pulled up the page. Indeed, Jericho had put together a very helpful diagram. The diagram showed how the client would call the authentication service, set a secure cookie on the browser, then subsequent calls to other services would read the cookie before forwarding to downstream services. It looked rather straightforward. Aiden was grateful for the complete documentation.

It would be a few hours before Alvin, who was based in the US, would be to work. Aiden sent an e-mail asking him to reach out as soon as he was in the office and available. Thankfully, a few hours later Alvin reached out over Slack.

Alvin: You needed to talk to me about the API gateway? What's your question?

Aiden: Hey, you know we're building the new MyRugsStuff.com site and I wonder what your take is on adding an API gateway.

Alvin: Ugh - I would avoid it if you can. It's been more trouble than it's worth if you ask me.

Aiden: Really? Why is that?

Alvin: You know - all of the site traffic has to go through that gateway. It's impossible to predict how to scale it so it seems like we're always growing and shrinking the gateway. Not only that, but no one over here really owns the gateway. Our team owns it sort of by default because someone has to own it but nobody wants it. I'm constantly getting requests to deploy configuration updates to the gateway. It's really disruptive to our team. I schedule a few story points in every sprint just for possible gateway changes. Do you remember that day that the hardware the gateway was on had a hard drive failure?

Aiden: Yeah - that was several months ago wasn't it. You were only down for about 15 minutes though, weren't you?

Alvin: Yes, exactly 13 minutes but the entire site was down because everything goes through that gateway. I mean everything! I had the CTO breathing down my neck the whole time and I couldn't do anything but wait for the operations team to work their magic. The accountants claim we lost $75,000 in revenue during that 13-minute outage.

Aiden: Wow. That must have been miserable.

Alvin: Oh - and also, I've been reading online that the version of the gateway we have been using might be deprecated soon. It's open source and the maintainers of the software haven't made any updates for quite a while. People have been asking about feature updates and security updates, but the maintainers haven't responded in a few weeks.

Aiden: This is all great information. Thanks for reaching back to me.

Alvin: You bet. If you want my advice - just don't do it.

So there it was. It seemed like there was no straightforward answer here. If he added the API Gateway like Jericho suggested, then they would probably deploy on time and everyone would be happy. Of course, if they did add the API Gateway, it might just be the beginning of another long-term series of headaches.

Aiden glanced at his inbox. There were recent messages from Brennan and Sean that were unopened. Just then, a Slack message appeared—from the CTO! Looks like this is a decision that could not wait any longer.


Exhibits

Exhibit A: Sample Wiki Design Page
The MyHomeStuff company logo.
MyRugStuff.com Authorization Service Design Page
Overview

The MyRugStuff.com Authorization Service is used to authenticate and authorize users of the MyRugStuff.com website. It relies on the existing MyHomeStuff.com authorization service to authorize existing customers. It returns a JWT token to the browser of the user once authenticated.

Helpful Diagrams
A box and stick diagram showing some data flowing back and forth.
Call Sequence
A sequence diagram should the calls to the authorization services.
Endpoints
URL HTTP Method Expected Request Expected Response
/auth POST username: <username> password: <password>
{ "id": "1234567890",
"fullname": "John Doe",
"privilege": [List TBD] }

Note: response is base64 encoded.

Back to content ↩

Footnotes

  1. Netflix ZUUL is an API gateway that has been widely used in the industry. Excellent documentation for this open-source product is found here: https://github.com/Netflix/zuul. As of August 2024, the latest version of ZUUL is 2.5.1.

    Here are some sample articles detailing some of the troubles of moving from ZUUL to Spring Cloud Gateway due to deprecation.

    Back to content ↩

Other Links: