The Need for IT Operations Agility: Lessons of WannaCry

There is little doubt that the news of ransomware like the recent outbreak of the WannaCry (aka Wcry, WannaCrypt) taking hold in critical infrastructure hits home with every IT professional. The list of affected clients of any ransomware or critical vulnerability is made even more frightening when it means the shutting down of services which could literally affect people’s health like the NHS is experiencing.

Would it be any different if it were a small hardware chain? What if it was a bank? What if it was your bank, and your money was now inaccessible because of it? The problem just became very real when you thought about that, didn’t it?

Know Your (Agile) Enemy

Organizations are struggling with the concept of more rapid delivery of services. We often hear that the greatest enemy of many products is status quo. It becomes even more challenging when we have bad actors who are successfully adopting practices to deliver faster and to iterate continuously. We aren’t talking Lorenzo Lamas and Jean Claude Van Damme kind of bad actors, but the kind who will lock down hospital IT infrastructure putting lives at risk in search of ransom.

While I’m writing this, the WannaCry ransomware has already evolved and morphed into something more resilient to the protections that we had thought could prevent it from spreading or taking hold in the first place. We don’t know who originally wrote the ransomware but we do know that in the time that we have been watching it that it has been getting stronger. As quickly as we thought we were fighting it off by reducing the attack surface,

The Risks of Moving Slowly

Larger organizations are often fighting the idea of risks of moving quickly with things like patching and version updates across their infrastructure. There are plenty of stories about an operating system patch or some server firmware that was implemented on the heels of its release to find out that it took down systems or impacted them negatively in one way or another. We don’t count or remember the hundred or thousands of patches that went well, but we sure do remember the ones that went wrong. Especially when they make the news.

This is where we face a conundrum. Many believe that having a conservative approach to deploying patches and updates is the safer way to go. Those folks view the risk of deploying an errant patch as the greater worry versus the risk of having a vulnerability exposed to a bad actor. We sometimes hear that because it’s in the confines of a private data center with a firewall at the ingress, that the attack surface is reduced. That’s like saying there are armor piercing bullets, but we just hope that nobody who comes after us has them.

Hope is not a strategy. That’s more than just a witty statement. That’s a fact.

Becoming and Agile IT Operations Team

Being agile on the IT operations side of things isn’t about daily standups. it’s about real agile practices including test-drive infrastructure and embracing platforms and practices that let us confidently adopt patches and software at a faster rate. A few key factors to think about include:

  • Version Control for your infrastructure environment
  • Snapshots, backups, and overall Business Continuity protections
  • Automation and orchestration for continuous configuration management
  • Automation and orchestration at all layers of the stack

There will be an onslaught of vendors using the WannaCry as part of their pitch to help drive the value of their protection products up. They are not wrong in leveraging this opportunity. The reality is that we have been riding the wave of using hope as a strategy. When it works, we feel comfortable. When it fails, there is nobody to blame except for those of us who have accepted moving slowly as an acceptable risk.

Having a snapshot, restore point, or some quickly accessible clone of a system will be a saving grace in the event of infection or data loss. There are practices needed to be wrapped around it. The tool is not the solution, but it enables us to create the methods to use the tool as a full solution.

Automation and orchestration are needed at every layer. Not just for putting infrastructure and applications out to begin with, but for continuous configuration management. There is no way that we can fight off vulnerabilities using practices that require human intervention throughout the remediation process. The more we automate, the more we can build recovery procedures and practices to enable clean rollbacks in the event of a bad patch as well as a bad actor.

Adapting IT Infrastructure to be Disposable

It’s my firm belief that we should have disposable infrastructure wherever possible. That also means we have to enable operations practices which mean we can lose portions of the infrastructure either by accident, incident, or on purpose, with minimal effect on the continuation of production services. These disposable IT assets (software and hardware) enable us to create a full stack, automated infrastructure, and to protect and provide resilience with a high level of safety.

We all hope that we won’t be on the wrong side of a vulnerability. Having experienced it myself, I changed the way that I approach every aspect of IT infrastructure. From the hardware to the application layers, we have the ability to protect against such vulnerabilities. Small changes can have big effects. Now is always the time to adapt to prepare for it. Don’t be caught out when we know what the risks are.

Why your Security Products are Inherently Insecure

You’re being sold snake oil every day in the world of IT. It is about time that we just lay this out honestly. The products that you are buying are not solutions. They are methodologies. Why does this semantic difference matter? It matters because we are blindly putting tools into place under the assumption that they are a solution to a problem. The truth is that they are merely tools in the fight to solve the problem.

Conceptual – Logical – Physical

Go back to the basics of systems architecture and infrastructure design for a moment. We view things in three stages of the design process as conceptual, logical, and physical. Conceptual design is thinking at a high level on the goal such as “the application servers will be protected from intrusion”. Moving to the logical physical version to expand on that concept would be something like “Layer 4-7 firewalls will be deployed at the ingress and egress point for the application servers”. Getting down to the physical is something like “Product X will be deployed to provide layer 4-7 firewall protection” which is the result of designing to meet the first two requirements.

The issue that we face as an industry is two-fold. First, we often start at “Product X will be deployed” without having done the due diligence on what the actual business and technical requirements are which need to be solved. The second issue is that we buy Product X, deploy Product X, and then everyone goes for a project completion dinner and celebrates that we have finished up the deployment with the bold assumption that we are inherently secure.

Many organizations are buying products or embracing some new technologies into their environments based on a promise. Promises should always bear translated to assumptions. I’ll start with one that I am seeing a lot of lately which is this:

“Containers are more secure for applications than virtual machines”

This is both true and false at the same time. The wording is important. What the phrase should say is “containers have the ability to be architected and deployed to be more secure for applications than traditional virtual machines”.

Here’s why phrasing is important.

Why is your security product inherently insecure?

You can’t buy a bow and arrow and suddenly you are an archer. The same goes for security. Just because you have bought a security product, it does not mean that you are secure. It’s actually the polar opposite. Your environment is inherently insecure. Even if you are absolutely sure that you are deployed in the most secure manner possible, you should ALWAYS ASSUME that you have been breached.

What’s the solution for this? This comes in three forms:

  1. Accept that you are insecure and build processes around that assumption
  2. Deploy and continuously test your security platforms
  3. Engage third-party testers and products to ensure continuous objective testing

Let’s dive into these three areas a little bit further.

Accept that you are insecure and build processes around that assumption

Point 1 is the key to begin with. Assume you have been breached. Now what? How are you aggregating your logs? How are you protecting the logging both locally on the application endpoints as well as in your central logging environments? If you have to assume that your ingress has been compromised, you also have to assume that your log environments have been compromised as well. You need local protection on each system plus centralized, read-only aggregation with regular snapshots of that environment to ensure its integrity too.

The build process you use will inevitably call on some external dependencies. It could be patches, software updates, or any of a wide variety of files and applications. Assume that these are inaccessible or compromised as you define your programmatic build process to use locally cached data and application dependencies as much as possible. And yes, the programmatic build process is key to ensuring consistency and security. You should include checksum and signature detection for all source files as you put them into the virtual application instances.

Deploy and continuously test your security platforms

Test-driven development is a great methodology. I have long been a user and a proponent of what is known as test-driven infrastructure and this includes the need for security as a part of the cycle. The only way that you know your detection system is working is if you test it when there is an issue. Assuming detection without truly testing the response means that you are relying on the assumption. Your CISO does not rely on assumptions, and neither do your customers.

Whichever products you embrace in your IT security portfolio, they will inevitably come with some form of baked in testing procedures and processes. Be aggressive and adamant with your vendors that this is a requirement for you. Nobody wants to be caught going back after a vulnerability to have to find out that it was detectable and preventable.

Engage third-party testers and products to ensure continuous objective testing

I hire someone to do my taxes. Yes, I can do them myself. That doesn’t mean that I’m an expert and can find every advantage within the tax code to get the best results. Why would I treat security and vulnerability testing any differently than any other discipline in my business and IT organization. Using 3rd party companies will give you the ability to lean on them for expertise, and most importantly, certification and validation of your security stance in an active environment.

Having spent years in financial services environments which have stringent requirements around auditing and security, I can tell you that no matter how secure even the IT security team thought they were, a 3rd party can come in and teach some rough lessons in a couple of hours.

Turn Assumptions into Actions

Going back to the example that containers are more secure than virtual machines gives us a great one to work from. Containers typically run thinner and provide a smaller attack surface for vulnerabilities, malware, and other attacks by bad actors. No, not Lorenzo Lamas, but anyone who is attempting to breach your environment. We will usually hear them being referred to as bad actors.

The truth is that containers as a construct, are solving deployment challenges first. Security is a secondary win that implies you have the practices in place to assure that security is greater than that of a traditional virtual machine. Containers are leveraging namespaces and other methods if isolation with the underlying server host to provide some potentially powerful protection. It does not mean that by default the container version of your application is more secure. It means that at the lowest possible layers, not including poor application code, SQL injection, XSS and many of a thousand different other attack vectors are solved by deploying in a container versus a traditional virtual machine.

The long and the short of it is that security products, or any technology products for that matter, are inherently insecure unless you deploy them with all of the practices in place around them to ensure the security.

This conversation on Twitter is a nice way to show how challenging it is to convey the message:

One Vault to Secure Them All: HashiCorp Releases Vault Enterprise 0.7

There are a few key reasons that you need to look at Vault by HashiCorp. If you’re in the business of IT on the Operations or the Development side of the aisle, you should already be looking at the entire HashiCorp ecosystem of tools. Vault is probably one that has my eye the most lately other than Terraform. Here is why I think it’s important:

  • Secret management is difficult
  • People are not good at secret management
  • Did I mention that secret management was difficult?

There are deeper technical reasons around handling secrets with automated deployments and introducing full multi-environment CI/CD, but the reality for many of the folks who read my blog and who I speak to in the community is that we are really early in our traditional application management to next-generation application management evolution. What I mean is that we are doing some things to enable better flow of applications and better management of infrastructure with some lingering bad practices.

Let’s get to the good stuff about HashiCorp Vault that we are talking about today.

Announcing HashiCorp Vault Enterprise version 0.7!

This is a very big deal as far as release go for a few reasons:

  • Secure multi-datacenter replication
  • Expanded granularity with Access Control policies
  • Enhanced UI to manage existing and new Vault capabilities

Many of the development and operations teams are struggling to find the right platform for secret management. Each public cloud provider has their own self-contained secret management tool. Many of the other platform providers such as Docker Datacenter also have their own version. The challenge with a solution that is vendor or platform specific is that you’re locked into the ecosystem.

Vault Enterprise as your All Around Secret Management

The reason that I’ve been digging into lots of the HashiCorp tools over the last few years is that they provide a really important abstraction from the underlying vendor platforms which are integrated through the open source providers. As I’ve moved up the stack from Vagrant for local builds and deployment to Terraform for IaaS and cloud provider builds, the secret management has leapt to the fore as an important next step.

Vault has both the traditional open source version and also the Vault Enterprise offering. Enterprise gives you support, and a few nifty additions that the regular Vault product don’t have. This update includes the very easy-to-use UI:

Under the replication area in the UI we can see where our replicas are enabled and the status of each of them. The replication can ben configured right in the UI by administrators which eases the process quite a bit:

Replication across environments ensures that you have the resiliency of a distributed environment, and that you can keep the secret backends close to where they are being consumed by your applications and infrastructure.  This is a big win over standalone version which required opening up VPNs, or serving over HTTPS which was the way many have been doing it in the past.  Or, worse, they were running multiple vaults in order to host one on each cloud or on-prem environment.

We have response wrapping very easily accessible in the UI:

As mentioned above, we also have the more granular policy management in Vault Enterprise 0.7 as you can see here:

If you want to get some more info on what HashiCorp is all about, I highly suggest that you have a listen to the recent podcasts I published over at the GC On-Demand site including the first with founder Mitchell Hashimoto, and the second with co-foudner Armon Dadgar. Both episodes will open up a lot of detail on what’s happening at HashiCorp, in the industry in general, and hopefully get you excited to kick the tires on some of these cool tools!

Congratulations to the HashiCorp team and community on the release of Vault Enterprise 0.7 today!  You can read up on the full press release of the Vault Enterprise update here at the HashiCorp website.

Customizing the Turbonomic HTML5 Login Screen Background

DISCLAIMER:  This is currently unsupported as any changes made to your Turbonomic login page may be removed with subsequent Turbonomic application updates.  This is meant to be a little bit of fun and can be easily repeated and reversed in the case of any updates or issues. Sometimes you want to spice up your web view for your application platforms.

This inspiration came from William Lam  as a little fun add on when you have a chance to update your login screen imagery. With the new HTML5 UI in Turbonomic it is as easy as one simple line of code to add a nice background to your login screen. Here is the before:

Since I’m a bit of a space fanatic, I want to use a little star-inspired look:

To add your own custom flavor, you simply need to remotely attach to your TAP instance over SSH, browse to the

/srv/www/htdocs/com.vmturbo.UX/app directory, and then modify the BODY tag in the index.html file.

Scroll down to the very bottom of the file because it’s the last few lines you need to access. Here is the before view:

Here is the updated code to use in your BODY tag:

body style="background-image: url(BACKGROUNDIMAGEFILENAME);background-size: contain;background-repeat: no-repeat;background-color: #000000"‍‍‍‍‍‍‍

This is the code that I’ve used for a web-hosted image:

body style="background-image: url(;background-size: contain;background-repeat: no-repeat;background-color: #000000"‍‍‍‍‍‍‍‍

Note the background-color tag as well.  That is for the overflow on the screen when your image doesn’t fill the full screen height and width.  I’ve set the background to be black for the image I’ve chosen. You can also upload your own custom image to your Turbonomic instance into the same folder, but as warned above, you may find that this update has to happen manually as you do future application updates to the Turbonomic environment.

For custom local images, the code would be using a local directory reference.  For ease of use, upload the image file right to the same folder and you can simply use the filename in the CSS code. The real fun is when you get to share your result.

I’d love to see your own version of the custom login screen. Drop in a commend below with your example and show how you liven up your Turbonomic instance with a little personalized view.

Using Terraform to Install DevStack on DigitalOcean

There are a few times where having a persistent OpenStack lab on a shared infrastructure is handy. I’ve been revisiting DevStack a lot more lately in order to help a few folks get their labs up and running. DevStack is the OpenStack project which lets you run non-production OpenStack using either a single or a multi-node configuration. Running on DigitalOcean means that I can have a lab that can spin up quickly (about 40 minutes) and also lets me find another handy use for Terraform.

NOTE: This uses an 80$/month DigitalOcean droplet, so please keep that in mind as you experiment.

Requirements for this are:

Getting the Code

All of the scripts and configuration are on GitHub for free use and are also open for contributions and updates if you see anything that you’re keen to add to. Remember that Terraform uses state files to manage your environment, so when you pull down the GitHub repo and launch your environment, it will create the .tfstate and .tfstatebackup files after you launch for the first time.

Grab the code using git clone to bring it down locally:

Change directory into the /terraform-samples/DigitalOcean/devstack folder where we will be working:

Make sure you have the environment variables setup including the DigitalOcean API token, SSH key file locations, and your SSH fingerprint. These can be exported into your environment using a script or as one-off commands:

The process that is run by the code is to:

  • Pull the DigitalOcean environment needs (API and SSH info)
  • Launch an 8 GB RAM droplet in the NYC2 region and attach your SSH fingerprint
  • Insert the DevStack build script (files/ as a cloud-init script

Those are the pre-requirements. Now it’s time to get started!

Launching the DevStack Build on DigitalOcean with Terraform

It’s always good to use a health check flow of your Terraform builds. Start by validating, running the plan, and then launching. This ensures that you have a good environment configuration and the process should work smoothly.

terraform validate

No news is good news. The code validated fine and we are ready to run the terraform plan command to see what will transpire when we launch the build:

We can see a single droplet will be created because we have nothing to start with. There are a number of parameters that are dynamic and will be populated when the environment launches. Time to go for it!

terraform apply

This is where you need a little bit of patience. The build takes approximately 45-60 minutes. We know the IP address of the environment because we requested it via the Terraform outputs. You can confirm this at any time by running the terraform output command:

Checking the DevStack Install Progress using the Cloud-Init Log

Let’s connect via SSH to our DigitalOcean droplet so we can monitor the build progress. We use the build script as a cloud-init script so that it launches as root during the deployment. This means you can keep track of the results using the /var/log/cloud-init.log and the /var/log/cloud-init-output.log files.

Install completion is indicated by a set of log results like this:

Let’s try it out to confirm using the OpenStack Horizon dashboard URL as indicated in the cloud-init output. There are two accounts created by the script which are admin and demo, both of which have secret-do as the default password.

NOTE: Please change your OpenStack passwords right away! These are simple, plain-text passwords that are packaged with the build and you are vulnerable to attack

That gets us up and running. You are incurring charges as long as the environment is up, so when you’re ready to bring the environment down and destroy the droplet, it’s as easy as it was to launch it.

Destroying the DevStack DigitalOcean Build Using Terraform Destroy

In just two quick words and a confirmation we can remove all of the environment: terraform destroy

Just like that we have installed an all-in-one OpenStack DevStack node on DigitalOcean and learned another nifty way to leverage Hashicorp Terraform to do it.

Adding SSH Access for DigitalOcean when Using Terraform

We’ve been looking at how to add a little Terraform into your IT infrastructure provisioning toolkit lately. Using DigitalOcean is also super easy and inexpensive for testing out processes and doing things like repetitive builds using Terraform.

The first post where we saw how to do a simple Terraform environment build on DigitalOcean appeared at my ON:Technology blog hosted at Turbonomic. That gave us the initial steps for a quick droplet deployment.

We also talked about how to access your DigitalOcean droplets via the command line using SSH keys here which is very important. The reason that it is important is that without SSH keys, you are relying on using the root account with a password. DigitalOcean will create a complex password for you when deploying your droplet. This is not something you can find out without actually resetting the root password and restarting your droplet. This is both insecure (reverting to password access instead of SSH key pair) and also disruptive because you are rebooting the instance to do the password reset.

Now it’s time to merge these two things together!

Adding SSH key details to the Terraform DigitalOcean provider

We are going to add a few things to what we have already done in those two other posts. You will need the following:

Getting your SSH fingerprint is a simple process. Start by going to the top right of your DigialOcean console to the icon which has a dropdown for your account settings:

In the profile page, choose the Settings option from the menu on the left-hand panel:

The SSH fingerprint that you’ll need is in the security settings page. Keep this somewhere as safe as you would your SSH keys themselves because this is an important piece of security information.

Using the SSH Details in Environment Variables

Our settings are going to be stored using local environment variables just like with our DigitalOcean key was in the first example blog. Because we have a few other things to keep track of now we will see the changes in the file:

Our environments variables are going to have the same format which is TF_VAR_digitalocean_ssh_fingerprint which is your fingerprint you got from the security settings. The other two things we need are the TF_VAR_digitalocean_pub_key and TF_VAR_digitalocean_private_key parameters which are the paths to your local SSH key files.

NOTE: The use of the file locations is actually not needed for basic key configuration using Terraform. I just thought we should set that up which will come to use later on in other blogs around using Terraform with DigitalOcean.

Use the export command to sett up your variables.  Our Terraform file contains an extra config parameter now which you’ll see here:

These new parameters will read in all that we need to launch a new droplet, attach the appropriate SSH key by the fingerprint in DigitalOcean, and then to allow us to manage the infrastructure with Terraform.

Time for our routine, which should always be: terraform validate to confirm our syntax is good followed by a terraform plan to test the environment:

Now we run our terraform apply to launch the droplet:

Now we have launched a droplet on DigitalOcean with Terraform. Use the SSH command line to connect to the droplet as the root account. Make sure you’ve done all the steps in the previous blog to set up your ssh-agent and then you should be all set:

This is the next step in making more secure, repeatable, and compassable infrastructure using Terraform on DigitalOcean. These same methods will also be showing up as we walk through future more complex examples on DigitalOcean and other providers.

Let’s clean up after ourselves to make sure that we take advantage of the disposable and elastic nature of our public cloud infrastructure by very easily running the terraform destroy command to remove the droplet:

Hopefully this is helpful!