infoTECH Feature

January 30, 2015

The Cost of Monitoring (but Not Automating)

By TMCnet Special Guest
Leon Adato, Head Geek, SolarWinds

A Sequel to The Cost of (NOT) Monitoring

Any IT professional who has used monitoring tools for any length of time is most likely perfectly comfortable setting up new devices, such as servers, routers, switches and the like. Adding sub-elements, such as disks and interfaces are probably a snap, too. There’s also a good chance they’ve set up their fair share of reports and data exports. Not to mention alerts.

But where it all comes to head is what to do with those alerts.

Most IT pros who use monitoring tools set up email or text message alert forwarding so they get them on their mobile devices. Those who are especially ambitious might even set up automated ticketing in whatever incident system their company uses. But once that’s set up, they usually call it a day.

What follows from there on out is a pretty regular occurrence: A monitoring tool will detect an error, a notification will be sent out, a human will be launched into action of some kind and the problem will (eventually) be resolved.

But why? Why disturb a living, breathing, working—or sleeping—person if a computer can do something about the situation? The fact is that many monitoring alerts have a simple response that can often be automated within the monitoring tool itself, and in doing so, save many hours of personnel time and resources—in short, help, the bottom line.

Consider these simple examples that are far too often overlooked:

Alert: XYZ service is down

Automated response: Attempt device restart

Alert: Disk is over X percent full

Automated response: Clear standard TEMP folders

Alert: IP address conflict detected

Automated response: Shut down port of newer device

The list could go on and on.

At any time, if an automated response is not successful, proper monitoring tools will trigger a secondary action—those emails, text messages or tickets mentioned above. At worst, an email, text message or ticket will be delayed by just a few minutes, but it will be delayed because the monitoring system instantly did what a human technician would have done once they logged in. So, in a sense the path to resolution is still more than a few minutes ahead of where it would be if the automated response had not been in place.

The possibilities for automation don’t end there, though. Effective monitoring tools also allow you to automatically start collecting additional needed information at the time of the alert, and then “inject” it into the alert itself. For example:

Alert: CPU utilization is over X percent

Automated response: Identify the top 10 processes, sorted by CPU usage

Alert: RAM (News - Alert) utilization is over X percent

Automated response: Identify the top 10 processes, sorted by RAM usage

Alert: VM is using more than X percent of host resources

Automated response: Identify VM by name

Alert: Disk is over X percent still full after clearing TEMP folders

Automated response: Scan disk for top 10 files, sorted by size, that have been added or updated in the last 24 hours

But does this type of monitoring automation really impact the bottom line? The answer is a resounding yes.

Case in point: A company recently implemented nothing more sophisticated than the disk-related automated responses outlined above—clearing the TEMP folders and alerting again after another 15 minutes if the disks were still full—and adding the top 10 processes to the high CPU alert.

The results were anywhere from 30 percent to 70 percent fewer alerts compared to the same month the previous year. In real numbers, this translated to anywhere from 43 to 175 fewer alerts per month. In addition, the support staff saw the results and responded faster to the remaining alerts because they knew the automated initial actions had already been done.

The CPU-related alerts obviously didn’t reduce, but once again, the support staff response improved since the tickets included information about what specifically was going wrong. In one case, the company was able to go back to a vendor and request a patch because they were able to finally prove a long-standing issue with the software.

As virtualization and falling costs—coupled, thankfully, with expanding budgets—push the growth of IT environments, the need to leverage monitoring to ensure the stability of computing environments becomes ever more obvious. Less obvious, but just as critical and valuable is the need ensure that the human cost of that monitoring remains low by implementing a monitoring tool that facilitates automation, and by actually leveraging those automation capabilities.




Edited by Maurice Nagle
FOLLOW US

Subscribe to InfoTECH Spotlight eNews

InfoTECH Spotlight eNews delivers the latest news impacting technology in the IT industry each week. Sign up to receive FREE breaking news today!
FREE eNewsletter

infoTECH Whitepapers