Log Management and Analytics

Explore the full capabilities of Log Management and Analytics powered by SolarWinds Loggly

View Product Info


Infrastructure Monitoring Powered by SolarWinds AppOptics

Instant visibility into servers, virtual hosts, and containerized environments

View Infrastructure Monitoring Info

Application Performance Monitoring Powered by SolarWinds AppOptics

Comprehensive, full-stack visibility, and troubleshooting

View Application Performance Monitoring Info

Digital Experience Monitoring Powered by SolarWinds Pingdom

Make your websites faster and more reliable with easy-to-use web performance and digital experience monitoring

View Digital Experience Monitoring Info

Blog DevOps

Loggly statistical alerts: Fewer false alarms, more sleep

Your alerts don’t have to be your alarm clock

By Pranay Kamat 18 Oct 2017

Your alerts don’t have to be your alarm clock

Are you one of those people who doesn’t need an alarm clock because your application’s alerts provide needless wake-up calls every morning? Then read on.

Based on hundreds of customer interviews, I realized that it’s actually really hard to use alerts to monitor modern systems with APIs, containers, microservices, and other integrations. It’s even harder to define what kind of alerts to set. Even the most battle-hardened DevOps leader has vouched that setting up the right alert threshold is not just a mix of art and science, but also requires some luck. So it is not at all surprising to me that a plethora of recent DevOps reports have concluded that alert fatigue is one of the top two or three pain points in the industry.

Our recently released Loggly 3.0 is designed for DevOps monitoring and makes it easier for you to set the right kind of alerts and eliminate alert fatigue. In this release, we have introduced a new feature in our alerts that enables you to use statistical measures to identify sudden change. Statistical alerts give you the power to identify an assortment of issues, which if left unattended, could cause significant business impact.

Sniff a sudden change

Seismographs can detect even the most minor of tremors in the Earth’s center. What if you had an equally sensitive tool to detect changes in your apps and user activity? Previously you had to specify a static threshold while defining count-based alerts. With our new alert feature, you can specify the threshold in relative terms using standard deviations. This statistical operator is a measure that quantifies the amount of variation in a set of data values. While a low standard deviation indicates that most of the values in the data set are close to the average, a high standard deviation indicates that they are distributed over a wide range of values.

Here’s an example of setting up an alert with this new capability: Alert when the count of 404 errors in the last 15 minutes is above two standard deviations from the average for the last six hours. 404 errors are a way of life, so setting an absolute threshold often doesn’t make sense. But you would certainly want to investigate a sudden 404 spike. In this case, you can specify whether you would like to be alerted on one, two, or three standard deviations from the mean. Mathematically speaking, what this implies is that you can choose to be alerted when some values exceed 68%, 95%, and 99% from the mean, respectively.

So think of all of the services you use where you would want to know about a sudden change, and what time period is useful as a starting baseline. Here are some more examples:

  • Monitoring spikes in user activity errors
  • Monitoring spikes or drops in metrics such as CPU utilization, free memory, etc.
  • Monitoring drops in metrics such as number of users, processed transactions, etc.
  • Monitoring changes in API requests processed

Spot the oddity…

Sometimes you are not interested in rate of change, but the oddness in some system behavior. In those cases, you need both volume and a historical perspective to understand if some behavior is legitimate or if it is an anomaly. For example, say you are interested in monitoring an abnormal amount of 403 errors. You can set up an alert like this: Alert when the percentage of 403 errors in a rolling window of 30 minutes is exceeded by, say, 25% compared to the last day.


Get your real alarm clock back on track

Statistical alerting is available to all Loggly customers with paid subscriptions and all Loggly users participating in a 14-day free trial. If you’re feeling even a hint of alert fatigue, I really suggest that you give it a try.

Don’t have a Loggly account? Try it now for free!

The Loggly and SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates. All other trademarks are the property of their respective owners.
Pranay Kamat

Pranay Kamat Pranay Kamat is Senior Product Manager at Loggly. His previous experiences include designing user interfaces, APIs, and data migration tools for Oracle and Accela. He has an MBA from The University of Texas at Austin and Master's degree in Computer Science from Cornell University.