Macmillan Learning shrinks complex DevOps processes to minutes with Loggly
Loggly has enabled Macmillan Learning to enhance, mature, and expand our instrumentation/triage processes across our product lines, while at the same time, reducing overhead on our engineers, allowing them to re-focus on our core value streams.
Michael Basil Senior Director of Engineering, Macmillan Learning
Highlights
- With the Loggly® solution, Macmillan takes multi-day, multi-group, multi-node processes down to just a few minutes, resulting in measurable time savings for DevOps teams
- Macmillan saved opportunity costs by not having to maintain a home-grown log management solution
- Macmillan uses Loggly to improve its ability to meet 15-minute community SLA and 10-minute triage goals.
Streamlining and simplifying the triage process
Macmillan LaunchPad is a cloud-based learning platform for digital content and tools for education. Its service-oriented architecture weaves in many technology components such as .NET, Node®, Docker Swarm®, Amazon EC2®, MySQL®, Amazon® RDS, PostgreSQL, MongoDB®, Amazon S3, and Redis™. The level of complexity for root cause analysis is very high.
Loggly breaks down this complexity for log searching, according to Michael Basil, Senior Director of Engineering at Macmillan. Loggly Dynamic Field Explorer™ (DFE) provides a summary of the log events and categorizes the data using various field and tag names.
“It’s like a newspaper article. You can read the top and get what you want or do some advanced query and dig in further,” says Basil. The Dynamic Field Explorer serves as the headlines for a quick scan, and users can easily drill down into more details with the advanced search capabilities that Loggly provides. Loggly Gamut™ Search allows for a large set of data to be searched in parallel and yields fast results, delivering measurable time savings for Macmillan’s DevOps teams.
Macmillan uses Loggly to significantly simplify its triage process. “Within 10 minutes after an issue arises, first response is already identified. If the issue is something simple, it’s localized and corrected,” Basil explains. The Macmillan team extensively relies on Loggly for many use cases—all contributing to the stringent SLA and triage goals. These use cases include, but are not limited to:
- Retroactive root cause analysis
- Debugging intermittent issues in production environments
- Heartbeat/health monitoring
- Cross-component debugging triage
- Locating noisy components with numerous errors
- Locating misconfigured components
Loggly provides capabilities in transaction tagging that are instrumental in helping Macmillan connect the dots in its complex, service-oriented architecture. Universally unique identifier (UUID) tagging allows it to track a transaction through the application workflow so that the Macmillan teams can easily understand how things evolve and change throughout the lifecycle. Third-party integration issues can also be identified and reported.
Macmillan has a solid tagging strategy for multiple groups to consume the logs coherently. This allows easy integration with status health endpoints at Macmillan. The status health endpoints show at a glance which components need attention.
With Loggly, Macmillan is able to meet its 15-minute community SLA and 10-minute triage goals for its LaunchPad platform, improving its customer experience and business foundation.
Inspiring collaboration and cultural changes
With all log events centralized in Loggly, every team at Macmillan can access the data for triage and troubleshooting purposes. With the self-service model, Macmillan removes communication gaps and frustrating bottlenecks in the triage process.
What used to take multiple days for multiple groups to investigate on multiple nodes now takes only a few minutes. This significantly improves collaboration and productivity amongst different groups. Various groups such as Engineering, Site Reliability Engineering, QA Ops, DevOps, and SysOps constantly rely on Loggly, some even on a daily basis.
Visibility beyond triage and troubleshooting
Aggregation of all logs breaks down data silos for Macmillan for triage purposes. And since all the logs are centralized, Macmillan has a greater understanding of its logging needs. There is clarity on the volume of overall log data that results from its daily activities, and Macmillan can draw a clear line on how much to retain for historical analysis and prevent loss of information. This data-driven approach has made its budget and resource planning more strategic and precise rather than an exercise of shooting in the dark.
Loggly with Macmillan Learning
Without much guidance, Macmillan’s teams have been able to reap significant benefits with the Loggly solution. The onboarding experience was smooth and fast—literally within one or two sprints. Loggly enables Macmillan to improve its processes, meet its stringent community and triage SLAs, and inspire cultural improvements.
“Loggly has enabled Macmillan Learning to enhance, mature, and expand our instrumentation/triage processes across our product lines, while at the same time, reducing overhead on our engineers, allowing them to re-focus on our core value streams,” Basil says.
In the coming months, Macmillan is excited to leverage more Loggly features such as charts and dashboards for KPI tracking and GitHub® integration to deep link into the source code for faster issue identification and fixing.