All companies are going through some form of cloud adoption – whether cloud migration for the first time, hybrid cloud adoption, or extending cloud-native with newer microservice architecture.
But, according to a recent survey by Aptum*, only 39% of companies are completely satisfied with their current rate of digital transformation.
Cloud adoption projects create a continuous state of change for engineering teams juggling to keep things up and running while limiting the impact on customers. Yet, when customers are impacted, it is often dismissed as unavoidable or the “cost” of moving to the cloud.
So how do you limit customer impact and keep sane? Read on for a summary of the webinar “Continuous Availability vs. Continuous Change: How to manage cloud adoption without impacting your users” with Moogsoft’s Chief Architect, Sean Molloy & Jason analyst from Intellyx or watch the full session here.
Issues that Arise when Moving to the Cloud
An interesting problem with the cloud is that it’s so easy to make things. When people are thinking about cloud adoption or moving to the cloud, mostly what it means is going from one of things to N of things, where N is a variable number depending on your needs. The cloud is just made up of servers that are managed through software, but they’re in a big data center like any other.
One thing that’s very different is when you click your mouse and you have another thing. So you end up with thousands of things, instead of 10s of things. When people are moving to the cloud most of what they’re dealing with is a transition from one to a high multiple number of things that they are managing.
Planning the Transition to the Cloud
Keep your plans small. If you’re creating a two-year plan where you’re going to achieve all these milestones over the course of 2 years, but in between these milestones, the cloud won’t be usable yet, it’s somewhat doomed to fail. While it may seem like it’s going to take longer overall, and it may, each one of those baby steps is much more attainable than trying to do all of it at a specific time which never really works.
Changing little pieces over time will work and will also give you a good footing for how those servers operate and how to manage them. Plan for baby steps and you’ll get a more immediate return. The overall trip will take longer, but it’s better in the long run.
How AIOps Helps Lower the Risk from Cloud to Go Live
When you move to cloud-based computing, you tend to have N copies of things. As a result of that, you end up with lots of chatter and lots of monitoring things.
Instead of having one counter for CPU, you have 32 counters for CPU, all the different little pods that are running. What’s important to remember is, when we use the term AIOps, is that it’s good to have computers watching the computers for you because humans don’t have a dashboard where they can just watch the CPU or memory anymore, because there are so many of them.
So, having a system that is watching the systems will be very important to understand when one component is misbehaving or one piece of your infrastructure of the 32 copies of it is not behaving correctly, because the worst part of having a distributed cluster of machines is having one drunk member. We always talk about these distributed computing clusters that work great when they’re all running or when you add or remove one. But when you have one node in a cluster that is slow, all the other members start to slow down or get broken by this.
Knowing that one of the members is misbehaving is really helpful. The AIOps kinds of systems are designed to pick out cases where one thing is different than the others or when things are similar to each other. If 42 things all seem to be having the same problem, maybe there is 1 problem, not 42 problems. You don’t want to get paged 42 times, you want to get paged once. That’s where that brainpower comes in.
Continuous Availability is not a Perfection
Perfection is impossible and planning for a world without outages is impossible. Availability is now about how do you manage for budgets and manage for impact on your users?
Invest in automation and deployment by robots. Make sure that whatever you were able to build, you’re able to redeploy very quickly. Know what kinds of things you can defer and retry and embrace the concept of eventual consistency. A lot of this elasticity will come allowing for a few seconds here and there of lag in queue.
For the full experience, watch the webinar above here.
Then sign up for a free trial of Moogsoft’s AIOps platform.