A Tale of Two Scrums: Agile Done Right and Agile Gone Wrong
Dr. Jeff Sutherland, co-creator of Scrum and Senior Advisor and Agile Coach to OpenView, explains why Healthcare.gov’s poor leadership and failure to meet a “working product” resulted in a sinking ship. On the other hand, he also explains how companies can follow in the footsteps of Spotify’s success by hiring experienced Agile coaches, employing seamless coordination, and mastering the art of systematic waste management. Listen in to this week’s Labcast below.
This Week’s Guest
“[Spotify’s] competition is Google, Amazon and Apple, any one of whom could crush them in a nanosecond, unless they’re faster, better, cheaper. And they have to stay that way. They have to keep on running out ahead.” – Jeff Sutherland
Healthcare.gov: An example of software development gone wrong
- Lack of coordination between the front end and back end. While the front end employed Scrum, it missed the second principle of the Agile Manifesto: “Working software” [3:00]
- Weak leadership. There were 20-25 consultancies working on the project, but no one steering the ship. [4:05]
- They project should have been launched on a state-by-state basis. This would have allowed success for the states that did work and allowed developers to work on the back end to fix the other states. [3:30]
Spotify: an example of Agile done right
- Why employ Scrum? Going Agile allows Spotify to be faster, better, and cheaper than industry Goliaths like Google, Amazon, and Apple. [8:40]
- Spotify ensures its Scrum masters are also experienced Agile coaches. [8:20]
- Excellent team coordination. Employing “squads” and “tribes,” Spotify makes sure its teams are continuously deploying software and sprinting. [9:15]
- Systematic waste removal. Knowing when to cut off slow teams is crucial to elegant Agile. [11:30]
Jonathan: Hello everyone, and welcome to Labcast. I’m your host, Jonathan Crowe, and with us today we have a very special guest — Dr. Jeff Sutherland. He’s the creator of Scrum. For those who don’t know, Scrum is an Agile development framework that dramatically changes how people work, how they organize themselves, and it makes organizations faster, better and far more productive.
Jeff has an incredible background and vast experience in software development. He’s been a VP of Engineering as well as a CTO or CEO of 11 software companies. He’s also the chair of the Scrum Foundation and he’s a senior adviser and Agile coach for us here at OpenView Venture Partners. In fact, Jeff leads a monthly Scrum master and product owner trainings right here in OpenView offices, and he’s kindly taking a break from one of those to talk with us right now.
Jeff, thank you very much for being here, for taking the time.
Jeff: Yeah, thanks for doing this Jonathan.
Jonathan: The topic we’ll be discussing today is one I’m really excited about. It’s examples of Agile done right, and Agile gone bad. This is something, I know, that comes up a lot in the trainings. People always like to hear about examples of Scrum and Agile in action, and lately there’s been one very prominent example of Agile gone bad that I’m curious to hear your thoughts about. Can you talk a little bit about healthcare.gov, what went wrong and how better Agile practices could have maybe helped avoid what’s become a really notorious software disaster?
Jeff: Right. Well, a couple of weeks ago NPR keep interviewing me almost every day for about a week on healthcare.gov. Healthcare.gov is primarily a waterfall project gone bad, and that’s quite common. 86 percent of waterfall projects fail. Most of them, about half of that 86 percent are total failures, never used, software never used. Actually, in the Department of Defense, historically it’s been even worse. 75 percent, they did an evaluation of $34 billion worth of Department of Defense projects around the year 2000. 75 percent were total and complete failures. Not a single line of code used.
So what we saw in healthcare.gov was the norm. However, they even put some worse twists on that. Because of the pressure to deploy after working for several years, it turns out that they only had six days of testing. Well, anybody in software development knows that is a total disaster.
Jeff: But they just pushed it live and nobody fessed up until after it was live that nothing works. So, they really compounded their errors.
Interestingly enough though, now, there’s been a lot written about this. A lot of people have really dug into it, and the front end of healthcare.com was an Agile project and they actually completed it in a few months, but the problem is they missed the second Agile principal. The principal number two in the Agile Manifesto is “working software.” So, the people on the front end did their piece, but they never had anything that worked.
Jonathan: Right. Right.
Jeff: So if they took it from an Agile perspective, actually it’s quite easy in healthcare.gov, one of the things they wanted to do was transfer people over to the state agencies that had healthcare systems, like Massachusetts, so that all the front had to do, front end needed to do was hand off to Massachusetts, and then people should be able to sign up, right?
Jonathan: Makes sense.
Jeff: And if they did that state by state, then they could have systematically at least done the states that worked, and then for those that didn’t have their systems, then they could deal with this very large back end problem that they had.
Jeff: Now it turns out if you look at what’s going on now, that’s exactly what’s happening. They’re starting to bring it up state by state. They should have done that at the beginning, so all you can ask is “What were these people thinking?” If you looked at the organization chart, they had 20 to 25 of the largest consultancies in the United States on this project. So clearly it was a case where nobody’s in charge of the ship, or the people in charge were probably administrators in the healthcare administration were clueless about software development, and they got exactly what you would expect to get when people that don’t know what they’re doing are trying to run a waterfall project which doesn’t work most of the time anyway.
Jonathan: So it sounds like there’s a few things there too, you know, the problem of too many cooks in the kitchen, no clear owners really. Also, you have two aspects of this that are being developed in very different systems. So it’s clearly you have the Agile-developed front end that does work, and then you have something completely separated, and that kind of goes against some of the principals as well, right?
Jeff: Right. This whole idea of historically people have tended to layer development. You know, “Let’s build a big database, and then let’s build a big middle layer, and then at the end of the day we’ll put some front end on it and hope it works.” That doesn’t typically work. We, last year, when Ken Schwaber and I wrote Software in 30 Days we did an in-depth analysis of the FBI Sentinel project, which was “Let’s bring all the data together on terrorists so we know, from every agency, what’s going on.”
Jonathan: So another huge project?
Jeff: Same as ObamaCare. You know, three or four years and $400 million later, virtually nothing works.
Now, in that case, the government stepped in, the government accounting office stepped in, sent a cease and decease letter to the vendor, in this case, it was primarily one vendor, and they stopped the project.
Jeff: It was such a critical project that the FBI brought in an Agile Chief Information Officer and Agile CTO and they set up 15 guys in the basement of the FBI building, and a year later, after $400 million had been wasted, a year later they spent $30 million, and the project came up, and came in under budget.
Jonathan: So you have one example, you know, this big, sprawling project. Money’s being poured into it. Then you switch gears. They bring in a very small team of people who are very, very focused, they’re Agile-driven, and it comes in far under budget.
Jonathan: That’s pretty amazing. Yeah. And so, I mean, I think that’s a good segue into talking about, you know, those are some of the disasters, some of the examples of development gone bad. What about some of the examples of development done right in an Agile way?
Jeff: Well, we have a lot of companies doing software quite well today. I mean, Microsoft has a 3,000 person Scrum team for all their development projects, or developer tooling, and they deliver at the end of every sprint, three-week sprints, a brand new release of every product. Okay?
So that’s what’s going on today. At Google it’s even more intense. They have 15,000 developers working on one branch of code. They go live multiple times a day; for desktop and mobile applications, it’s every week or two, and they run 75 million automated tests every day.
Jonathan: So, that’s incredible. I mean…
Jeff: So people…
Jonathan: That’s kind of the pinnacle of what’s possible.
Jeff: Right. It’s amazing as I go around doing training, so many companies have no idea what they have to compete with today. I mean, Google, Microsoft, even Adobe, Autodesk, all the people have major Scrum implementations. They are going to absolutely crush anyone who does not get this done right.
One of the most interesting stories, you brought up earlier, was Spotify. They’re very interesting because they’ve done a very elegant, agile implementation, mainly because they insist that the Scrum masters be actually Agile coaches, experienced Agile coaches, many of whom they hire from outside the company. Some of the leading trainers in the world have been brought into Spotify to fill that kind of Scrum master job.
Jeff: So they have approached this really systematically, and the reason that the management did that is because their competition is Google, Amazon and Apple, any one of whom could crush them in a nanosecond, unless they’re faster, better, cheaper, and they have to stay that way, they have to keep on running out ahead of, if, you know, look at iTunes now. It’s got iTunes Radio, okay?
Jeff: It’s just like Spotify’s radio. They’re going to try to eat Spotify’s lunch. So, in order to go faster and faster, the teams have to get better and better. Well, Spotify has many teams and multiple locations all over the world. They have a, they call their team “squads.” These little Scrum teams, every squad can deploy its software at the end of every sprint, which is now three weeks, without breaking any other team.
So, in order to do that, they have to manage, they have to aggregate squads into what they call “tribes,” groups of teams. An interesting thing about Spotify squads is that each team has a piece of the product that’s visible that is completely theirs.
Jeff: But they have to deploy and change and upgrade that constantly without breaking anything else.
Jonathan: So that takes a lot of coordination. Amazing.
Jeff: All the dependencies have to be managed across the company to do that. So, what’s happened at Spotify though is that deploying, every team deploying their software upgrade, every sprint is not fast enough. They’re going to have to go to continuous deployment where they’ll deploy multiple times a sprint. We have teams, we have HubSpot down in Cambridge. They deploy 170 times a day.
Jeff: On slow days. Okay? That’s what’s going on in the industry.
Jonathan: That’s incredible.
Jeff: So, the picture, from starting with ObamaCare, I mean, that is so far, that is some dinosaur in the prehistoric era.
Jonathan: Right. Right.
Jeff: Compared to what’s going on here right downtown, or even probably on the street we’re sitting on.
Jonathan: Absolutely. And so, there were a couple things that jumped out with using Spotify as an example, first, they have incredible buy-in from management and leadership. They’re bringing in outside Agile coaches to actually be on the team. They’ve decided if they’re going to do Scrum, they’re going to do it right.
Jonathan: And then, also, this idea of having small teams focus on one thing they completely own. It sounds like those are a couple good principals.
Jeff: Exactly. Exactly. There’re several things that you have to manage when you’re scaling up like that, and in every case Spotify has done it elegantly. For example, they originally had an operations team that did the deployment, but that was too slow, so they said okay, we can’t have somebody blocking deployment. It’s not that we don’t need good operations practices, so we want to move the operations team out of the road. We have no more release teams. That’s a lot of extra overhead. They get in the way. The development team itself must deploy, but they need help with the right tools and procedures from the operations experts so they can all deploy together consistently. Okay?
Jeff: So, this idea of removing waste from the system is fundamental to Scrum and again, Spotify is an elegant example of systematically removing waste.
Jonathan: Right. Well, that’s great, and Jeff, I know that people who are more interested in Scrum can easily find you very simply by Googling. They can go to ScrumInc.com. Are there any other places they should go to to learn more about Scrum and also connect with you?
Jeff: Well, I have a blog at Scrum.JeffSutherland.com. People can go there. We have two books that are out on Amazon right now. One is called Software in 30 Days. The other one is a novel that we’ve written that people should take a look at. So go to Amazon. There’s both a novel and a technical book, Software in 30 Days that are really good books to get started with.
Jonathan: Okay. Great. That’s definitely good suggestions everyone should check out, and Jeff, thank you so much for taking the time. Really appreciate it.
Jeff: Thank you.
Can you think of any other examples of Scrum gone wrong or companies that have mastered Agile?