Well, that was fast! Another year has come and gone. It is safe to say 2020, ‘21 and ‘22 were exceptional, and only sometimes for good reasons. But I take heart in society’s steady progress toward digital maturity through it all. Nearly 100% of IT leaders say the pandemic accelerated their organization’s rate of digital transformation. We have seen increased interest in automation and orchestration as a result, which in turn accelerates artificial intelligence for IT operations (AIOps) deployment.
The last decade has seen exponential investment into DevOps and AIOps — especially when you remember that AIOps as a concept did not exist 15 years ago. Considering recent headway, it is valuable to review how far AIOps has come and use that progress roadmap to determine where the industry will trend in 2023 (and beyond!). Now, reach into the back of your fridge, pour yourself some leftover eggnog and let us get into it.
AIOps 1.0 is nearing completion
The initial wave of AIOps offerings came to market more than a decade ago. For context, when I founded Moogsoft in 2011, DevOps was a young movement, most organizations hosted their data on-prem and continuous integration/continuous development (CI/CD) philosophies were revolutionary. Clearly, our digital landscape has changed dramatically since then. Now, scattered tech stacks and multi-cloud data rule the day.
AIOps solutions that operate on decade-old inferences will ultimately stumble when faced with this modern IT infrastructure. This is especially true for event-only workflows over-relying on monitoring and domain-specific solutions prioritizing one dimension of AIOps over another. In effect, these tools focus on the what? or the how? as opposed to the complete picture: who, what, when, where and why. As workflows become more interconnected and complicated, comprehensive and immediate solutions will be critical. Accordingly, tools that support the entire change management process will define AIOps 2.0.
Data is consolidating
AIOps 2.0 will be defined by data consolidation. A proper AIOps workflow begins by ingesting and normalizing disparate source data, including log events, traces and metrics. By normalizing these data, AIOps can reduce incident volume and event noise, all while increasing situational awareness thanks to ML. DevOps and site reliability engineer (SRE) teams can use this aggregate analysis to identify problems before they arise, reducing mean time to recover (MTTR) and increasing overall system uptime.
Again — this is the proper AIOps workflow on which we built Moogsoft a decade ago. But many solutions on the market claiming to be AIOps stick to a problematic rules-based approach that treats IT infrastructure and data as static. Of course, IT systems are dynamic, as is data, and new problems arise by the hour. When so-called AIOps solutions ignore this fact, SRE teams and DevOps engineers lose access to the entire picture and inevitably miss concerning patterns in their data.
But the dog days of undynamic data are nearing an end. Authentic AIOps tools that consider the entire incident lifecycle and simplify the event management process will soon win the day as data sprawl creeps ever higher.
AIOps will integrate with (and improve) IT service management (ITSM)
Current ITSM tools are largely unchanged from the troubleshooting systems of yesteryear (or, in this case, those of about 30 years ago). Employees submit tickets using web forms attached to finite-state machine (FSM) logic. An FSM-based system escalates tickets to an administrator but cannot process the request’s colloquial context. For example, let us say an ITSM process starts with a verbal conversation around the water cooler. In the ticket arising from said conversation, employees may paraphrase their problem, preventing the system from reliably automating the review process.
AIOps presents a few opportunities for improvement to the ITSM workflow. First, AIOps tools can automate ticket creation after anomaly detection, creating a process with far greater context and fewer false positives. And with exposure to human-generated tickets, the system can understand the semantic context of future tickets via ML. For internal IT teams dealing with a monstrous ticket backlog, the value here is likely apparent: teams spend less time troubleshooting and more time addressing root issues.
Overall, I am happy to report the future of AIOps is looking bright. True AI will define the next era, and ML-backed tools that consolidate data to create an accurate timeline of the incident lifecycle will win the day. Moreover, IT leaders will soon apply those solutions across other workflows, including tedious ITSM. That means faster error identification and less system downtime. Lower mean time to detect and MTTR? I would call that an extremely admirable New Year’s resolution.