Infrastructure and applications are rapidly shifting to more elastic, distributed cloud environments. As a result, the best DevOps strategies now require full visibility not only up and down the stack, but also across all stages of the application lifecycle. Traditional health checks, element managers, and systems consolidators typically can’t keep up with greater data variety and the complexity of modern architectures.

In response, tools like APM and log management have emerged to help with pre-deployment engineering and post-event analysis.

However, APM and log management rely on the luxury of time and don’t tell us much about the production environment, when real-time matters.

For today’s cloud applications, modern infrastructure monitoring aggregates metrics for an essential real-time view of your whole environment, whether in production or development. By focusing on time series analytics, infrastructure monitoring fills a large gap not previously addressed by APM or logs: intelligent and timely alerting on service-wide issues and trends.

The primary objective of APM is to test code against downstream performance issues before deployment. APM should be used for what it is exceptional at doing: providing transaction traces and identifying bottlenecks in code before production. Meanwhile, several factors outside of your code can create real issues that affect the operations and performance of your application in production.

APM tools were not designed for monitoring and alerting on the service-level operations of today’s diverse environments. They can’t aggregate and alert on patterns in high-fidelity metrics flowing from open-source middleware like Kafka, Elasticsearch, Cassandra, Docker, and Mesos. And they are not useful when it comes to correlating insights from both your cloud services and your legacy networks, storage, servers, and databases.

Similarly, logs are not particularly useful for alerting on real-time infrastructure issues. Because logs are primarily unstructured data, they are well suited to batch analysis as a post-mortem review of a discrete event. However, massive data volume makes logs a bad fit for the real-time search and stream processing that timely alerts rely on.

Aggregating metrics is your best line of defense for monitoring a production environment in real time. Streamed into an analytics-based infrastructure monitoring service, production metrics help you pinpoint and remediate issues before they even become a problem.

With real-time insight into the production environment introduced by modern infrastructure monitoring, application developers, infrastructure engineers, and operations teams can collaborate across the entire application lifecycle for the first time. Intelligent alerting from infrastructure monitoring is the missing piece between APM’s pre-production performance engineering and log management’s post-hoc event analysis. 

To learn more, check out SignalFx, the most advanced monitoring solution for cloud infrastructure and applications. Our team previously built the analytics system in use at Facebook that monitors more than 22 trillion metrics per day. Now we provide monitoring-as-a-service to help operations and product teams of all sizes manage their cloud environments in production. Get started now with a free trial!



About the authors

Ryan Goldman

Ryan is Head of Marketing at SignalFx. Previously, he managed product marketing at Cloudera and Mixpanel, was a marketing strategist at Cisco, and supported international development NGO projects in Washington, D.C.

Enjoyed this blog post? Sign up for our blog updates