There is no question that AIOps is the next big thing in IT. The only way to deal with the burgeoning complexity and accelerating rate of change in modern IT environments is to abandon traditional model-based management, and adopt algorithm-powered analytics instead.
As more and more IT professionals come to grips with the scale of the problem, adoption of AIOps is moving from niche to widespread. In fact, analysts at Gartner have estimated that, by 2020, approximately 50% of enterprises will be actively using AIOps platforms to provide insight into both business execution and IT Operations.
However, as with any hot new topic, it can sometimes be difficult to make sense of all the offerings clamoring to be considered under the moniker of AIOps. This is a new segment, so by definition, AIOps products cannot have decades-long track records or thousands-strong user bases. While some legacy tools from previous technology generations have attempted to rebrand as “AIOps,” the market can generally spot the dads at the skate park pretty quickly.
What these claims of “open box machine learning” generally break down to is some form of modelling, perhaps with dynamic thresholding or similar logic added — which can be valuable, but it is not machine learning
This still leaves quite a few offerings to sift through, though. Gartner list 20 vendors in their Market Guide for AIOps Platforms. Helpfully, the various solutions listed by Gartner are further classified by the breadth of their coverage and the particular capabilities that they offer.
Reading through that list, it quickly becomes obvious that most of the offerings are single-purpose tools, with only a subset of the 11 capabilities that Gartner considers necessary for full coverage. Indeed, Moogsoft is the only vendor on the list to offer a single platform that covers all of these required capabilities.
Historical Pattern-Matching versus Real-Time Detection
One common approach is to focus on pattern matching, in particular in historical data such as logs. If a certain pattern that leads to an incident can be identified, any recurrence of that pattern can be flagged so that operators can resolve an issue more rapidly, or even prevent it entirely. This is, of course, a useful technique, and one that the Moogsoft AIOps platform also includes, but on its own it is not sufficient.
Tools that only have the ability to identify recurring issues are by definition blind to problems that are occurring for the first time. The problem is that the very same mechanisms that drive the interest in AIOps — rapid growth in scale and accelerating rate of change — mean that a growing proportion of incidents are occurring for the first time. Many Moogsoft AIOps users find that fewer than half of their issues are recurring. Algorithms that can identify patterns in real time will still correctly identify these events as significant, and will map their correlations. But tools that only have the ability to map events against known patterns will miss all of these “unknown unknowns.”
What Is, and Is Not, Machine Learning
Because of the attraction that AIOps has for buyers, some vendors have also attempted to shoehorn into their offerings a functionality that is not, strictly speaking, AIOps at all. One recent claim is around the concept of “open machine learning,” supposedly in contrast to the closed nature of actual machine learning.
Users getting to grips with machine learning for the first time can sometimes struggle with its “black box” nature, better known as the “reproducibility problem” — it is not always clear to users why a certain result was produced. Further, the way to influence the behavior of the system is not to edit code or change the value of variables, but simply to rate results as being more or less desirable, and allow the system to learn and evolve over time to produce results that are increasingly closer to what users consider good.
The reality is that there is no “box” to open. With true machine learning, the precise logic pathways that led to a certain result are unknowable by definition. There is no way to set a breakpoint in the middle of execution and debug what is happening. What these claims of “open box machine learning” generally break down to is some form of modelling, perhaps with dynamic thresholding or similar logic added — which can be valuable, but it is not machine learning. Busy sysadmins may feel confused at first, but will quickly come to appreciate the value of AIOps in removing many of the annoyances of day-to-day life in IT.
Why Chaining Multiple AIOps Tools Doesn’t Work
Users confused by these detailed differences might be tempted to build their own solution by adopting various different tools and integrating them afterwards. While this “ITOps toolchain” approach is absolutely valid, when it comes to AIOps, users require a single source of truth. The reason is that, if two tools operate in sequence on the same event data, one is looking at events that have already been filtered by the other, and may have made different decisions or flagged different patterns. A complete AIOps platform, incorporating multiple different algorithmic messages, beats single-purpose tools because, otherwise, users have no definitive source of truth to rely on. This platform can itself still form part of a wider toolchain, but without duplicate conflicting filtering.
Learn From People Who Did It Already
As the AIOps market continues to grow and develop, there is still some way to go before these distinctions become more widely understood. Early adopters are sharing their own experiences and results with AIOps, which helps others to ask better questions along their own AIOps adoption journey. The 2018 AIOps Buyer’s Guide goes into some of the topics to keep in mind when considering a move into AIOps. This in turn helps to smooth the progress of digital transformation projects, ensuring successful adoption of AIOps.