Customers love us for our out-of-the-box integrations and built-in dashboards for services like ActiveMQ. Underneath our no-fuss solutions, SignalFx runs the most powerful monitoring service for modern applications.
One or our customers recently ran into an ActiveMQ problem that they couldn’t pin down without SignalFx. In their version of ActiveMQ, messages sometimes get “stuck” in the queue, and message consumers won’t pick them up even if they have available capacity. This bug causes messages to never be delivered.
Every time they encountered this condition, it meant a critical issue because every “stuck” message was a client request going unfulfilled. Other monitoring tools weren’t able to detect this condition when it happened, because they couldn’t provide visibility into the messages that never make it out of the queue. There was no way for to understand whether messages were stuck in the queue, nor for how long. This issue became a problem for their own customers and business.
They solved this problem by writing a tool that inspects each enqueued message, calculates the average and maximum age of messages each queue, and then reports those metrics to SignalFx using our Java client library. They’ve let us share this project with everyone in the wider community. You can find it on GitHub at https://github.com/signalfx/activemq-integration.
We’ve made a built-in dashboard to display the metrics produced to make it easy to see if there are any queues with messages that have been waiting… and waiting… and waiting…
Using these metrics from inside each message queue, we can create intelligent detectors that alert when there’s a message stuck in the queue and unable to be delivered. For example, you can create detector that fires when the oldest message in the queue has been getting older for at least 5 minutes. To build this, we use the analytics function “Rate of change”, which lets us know how quickly a metric is changing.
In this example, rate of change tells us how much older the oldest message in each queue is getting each time we measure it. When this function is greater than 0, it tells us that a message is sitting in the queue, aging. If this continues for a long time, it could indicate that one or more messages is stuck and not being picked up.
Now our customer has detailed information about the messages in their ActiveMQ queues and can monitor the conditions that really matter to their business.
When you need visibility into metrics that matter, get it done with SignalFx!