Building a Modern Security Engineering Organization
October 24, 2016
Over the past decade, the world has fundamentally changed in a variety of ways, with huge implications for
business. We’ve seen the rise of transformational new technologies, for instance, such as cloud, mobile, and
big data. When it comes to running a modern security engineering team that keeps your business secure, three changes have been particularly important: speed to market, continuous deployment, and increasing the cost of the attack.
Speed to Market Taxes Security
For starters, things move a lot faster than they used to. Code that once took weeks or even months to deploy
can now go into production almost instantaneously. Plus, we’ve got the added complexity of having more
people with access to production systems than ever before as the responsibilities of development and operations teams merge. Last but not least, the cost of launching attacks has dropped significantly, making it a lot easier for hackers to target companies.
To adequately address these changes, today’s security engineering teams need to understand continuous
deployment and DevOps. Not only that, they need to figure out ways to drive up attack costs to make themselves a harder target for attackers.
Near Instantaneous Deployment is the New Norm
We’ve come a long way from the days of traditional waterfall, where deployment to production was often
months or even years away. In my previous role at Etsy as Director of Security Engineering, we were pushing
new code to production an average of 30 times a day. Additionally, we were constantly iterating in
production using feature flags, ramp ups, and A/B testing — something that’s been a game changer for
security requiring everyone to adopt a completely new mindset.
The control we thought we had was really just an illusion. Why? Because every practical development methodology results in shipping code with vulnerabilities in one way or another.
In the old deployment models, like waterfall, security functioned as a blocker to the business requiring sign
off before allowing anything to go into production. The shift to quicker deployment models is therefore often
scary to security teams. It feels like code is now going to be flying out the door without any degree of control.
But here’s the thing. The control we thought we had was really just an illusion. Why? Because every practical
development methodology results in shipping code with vulnerabilities in one way or another.
What makes continuous deployment a better and ultimately safer option is that it allows you to actually react
when those vulnerabilities are discovered. That’s critical given most customers’ growing demands and
expectations, particularly when issues arise.
If you’ve ever lived through waterfall development methodologies or out-of-band patches, then you know
how painful it can be when an emergency comes up. Whether it’s because of a security issue, a performance
issue, or just a general bug fix, shipping any type of fix, especially for an emergency, has traditionally been
incredibly hard. Most organizations that only release every 18 months just aren’t designed to rush something
out the door in a matter of days or even weeks. With continuous deployment, by contrast, there’s no such
thing as an out-of-band patch. An “emergency fix” is just one of the dozens of deployments that are already
going to happen that day.
With continuous deployment, by contrast, there’s no such thing as an out-of-band patch. An “emergency fix” is just one of the dozens of deployments that are already going to happen that day.
What makes continuous deployment safe?
In a word, safety comes from “visibility.” Over the past five years, DevOps teams have been focused on
increasing visibility and awareness to facilitate informed decision-making. Although security is a few years
behind the curve here, we’re finally headed in that direction now, too.
To explain why, let me draw an analogy to aviation. Security, at present, is like piloting a plane without any
instruments. Sure, you can fly, but when there are bumps along the way you have no idea if it’s because
you’ve just hit some turbulence or because your engines are on fire. In other words, it’s like living in a binary world where things are either fine or they’re not, when of course it’s never really that black or white.
Thankfully, with the shift to DevOps and continuous deployment, we have the opportunity to gain far greater
visibility and awareness than ever before so that we can make better decisions. Of course, to ensure the kind of visibility and awareness you need, you’ve got to actively share information with other teams and organizations. One way of doing this is by embracing the cultural change that the shift to DevOps/continuous deployment often triggers.
Greater Communication is Key
With continuous deployment, you no longer kick your code over to Q&A for six weeks and then on to staging
for twelve more. Instead, you perform code reviews and tests and then ultimately deploy it to production yourself. By removing the old organizational blockers, speed is dramatically increased.
For security engineering teams, this means that if you’re a roadblock to development, it’s now easy for them
to work around and actively avoid you. A big part of the solution is better communication, and here are some
key lessons learned:
- Don’t be a jerk. This should be obvious, but empathy needs to be a core part of your security team’s culture. People should want to talk to security, so make sure that you’re hiring with that in mind. Especially important is empathy with operations and development teams. Understanding their daily battles and commiserating gives you credibility making you more successful in the long run.
- Make realistic tradeoffs. Don’t fall into the trap of thinking every issue is critical. If you prioritize the ones that really matter and agree to not hold up the works for those that don’t, you’ll find that teams will be much more willing to engage with you.
- Explain impact clearly. Telling colleagues in another department that “if an attacker did X and Y, our user data would be compromised” paints a clear picture. Telling them that “the input validation in this function is weak” doesn’t. Remove the security language barrier by speaking in plain English.
- Reward people who communicate with your team. Believe it or not, t-shirts, gift cards and high fives all work (shockingly) well. Creating a culture where interacting with security is seen as a positive thing will dramatically pay off.
- Take the false positive hit yourself. Wherever possible, avoid sending unverified issues to engineering / operations teams. When issues are discovered or reported, have the security engineering team verify them and potentially even make the first attempt at a patch. When security sends loads of unverified issues to engineering teams that turn out to be false positives, engineering will rightfully ignore future communications from the security team which is exactly what you want to avoid.
- Scale via team leads. Build relationships with technical leads from other teams, encouraging them to make security part of their team’s culture. This ensures that when new engineers join their respective teams, security is emphasized to them even without your direct involvement.
While it may sound trivial, the best you thing you can do to help ensure the success of your security team is to promote better communication.
Widespread Access Needs to be Managed
Most startups begin with a pretty simple access control policy: everyone gets access to everything. That’s particularly true as development and operations teams merge. Of course, as organizations grow and scale, this becomes increasingly problematic and pressure starts to mount to put some policies and regulations around who can access what.
The key to getting it right is avoiding knee-jerk reactions and taking away capabilities from people when they’re just trying to do their job. Instead, focus on building safe ways to perform needed job functions, by taking the following approach:
Methodology
- Don’t be a blocker, be an enabler. Figure out what the underlying function or capability is that your
colleagues need. What is it that they require to get their job done? Once you understand the need, get out of the way!
- Do not say NO, instead deliver alternative solutions. Create an alternative, safe way for them to perform the function or capability. Give better ways to get the job done and employees will use them.
- Build options and impact change. Transition your entire organization over to the new, safer way of doing things. A transition takes time, don’t expect this to happen over night.
- Phase out the old in a controlled manner. Begin soft-failing the old system, setting up alerts to notify you of any usage of the old unsafe way of doing things so you can correct those instances.
In Practice
An often seen example of this is where a large percentage of the development organization has SSH access to production systems. SSH is typically used in an administrative capacity to provide access to a prompt on a remote or local system. In this case, the steps to improving security are:
- Determine why SSH access is needed to production systems. Often it’s due to needing to be able to view application logs to debug issues.
- Create the alternative solution. In this case application logs used for debugging are the required item, not SSH. Therefore, by providing an alternative way to safely access that data via a central logging system like Splunk, ELK, etc., SSH access can be removed over time.
- Transition over time. Publicize the new alternative way to access the data. When users are aware of a new, better, and more secure way of getting their jobs done they will naturally transition to the new system.
- Monitor on behavioral anomalies. Begin alerting on SSH access to production systems so a reminder about the new approach can be sent. Again this is a phase in the transition of users to the new system. Continue to softly alert them to the new and improved method without becoming the department of “no.”
- After transition, push the final hold outs. Restrict SSH access down to only those which require it, ex: sysops. Make sure that you have given ample time and direction to those in need of the solution.
If you take this approach, everyone wins. Security doesn’t become a blocker by removing capabilities that people need to be effective, but instead they provided a safe approach to perform the required tasks.
Increasing the Cost of Attack Brings Advantage to the Defender
Although it has become cheaper and easier to conduct attacks, there are several ways to use this to your advantage as a defender. Some of the most effective approaches are to run realistic attack simulations against your organization, have a disclosure policy, and potentially even a bug bounty program. The goals of these sorts of programs are to:
- Incentivize people to report issues to you.
- Drive up the costs of vulnerability discovery and exploitation.
- Provide external validation of where your security program is and isn’t working.
If you’re worried about budgetary concerns, money is rarely the main motivation for researchers reporting issues (although it certainly helps!). Similarly, if you’re concerned about inviting attacks, the fact is that if you’re on the Internet you already get a free penetration assessment every single day, you just don’t receive the report.
Before launching a disclosure program or a bounty, one of the most effective things you can do is take note of what vulnerability classes you expect to see and what ones you don’t. You can then compare your expectations against the issues that actually wind up getting reported to provide extremely useful data on where your security program is working well and where it needs adjustment and iteration.
If you’re concerned about inviting attacks, the fact is that if you’re on the Internet you already get a free penetration assessment every single day, you just don’t receive the report.
Keep Calm and Enable Your Business
The shift to DevOps and continuous deployment often feels scary to security teams because it represents such a significant departure from the way we’ve approached security in the past. However, instead of reducing security this transition actually affords us a unique opportunity to fundamentally shift the position of security from being a blocker to enabling greater business velocity.