What is Spark?

Apache Spark is an open-source general-purpose cluster computing system. With its in-memory data processing engine, development APIs, and support for higher level tools, it allows data workers to efficiently execute streaming, data processing, ETL, machine learning, or SQL workloads that require fast interactive access to datasets. With Spark running, developers everywhere can now create applications to exploit Spark’s power, derive insights, and enrich their data science workloads within a single, shared dataset in Hadoop.

The Challenges with Monitoring Spark

The adoption of Spark and the associated desire to innovate with massive amounts of data has changed the requirements of monitoring tools. There are two sources of challenges associated with Spark monitoring in production environments: complexities with multiple layers and ephemeral nature of Spark.

Complexities with Multiple Layers

The Spark architecture consists of multiple components at multiple layers, all which come together to make a Spark application work:

  • Spark infrastructure, including Spark Master and Spark Worker
  • Applications that run on top of the infrastructure, including Spark Executors and Spark Driver
  • Underlying resources, including disk, CPU, network bandwidth

Understanding performance and utilization of Spark clusters and applications require real-time visibility into each and every component.

Ephemeral Nature of Spark

The amount of time required to run certain Spark applications varies based on use case. Developers leveraging Spark applications may require tasks to be run once a day, few times a day, or even once a week or month. With tasks moving in and out of the environment at any given moment, it becomes increasingly difficult to track and monitor performance in production.

The SignalFx Difference

Monitoring at Every Level

When it comes to operating Spark environments at scale, understanding performance relies on dependencies among the various layers within the Spark architecture. With SignalFx, you can monitor from Spark process to node to cluster or job to application stage in a single dashboard. Instant visibility across all layers of the Spark architecture gives you both the flexibility to gain a service-wide view of performance and the power to explore individual details.

For Spark admins, start with an aggregated view of your Spark cluster. Easily drill down to master processes and worker processes. Correlate performance metrics down to the specific node, and evaluate whether the application is impacted at the service level.

For Spark application developers, start with an aggregated view of your data by Spark application and user. Easily drill down to key metrics on active stages and runtimes, driver and executor utilization, and processing times for streaming applications.

Select Spark Metrics

JVM Used/Committed
JVM Heap Used/Committed
Worker Cores Free/Used
Worker Executors
Worker Memory
Master Workers
Master Applications
Spark Job Tasks
Job Active/Complete/
Skipped/Failed Tasks
Job Active/Complete/
Skipped/Failed Stages
Stage Input/Output 
Bytes & Records
Memory Usage
Max Memory 
Disk Usage
Executor Shuffle
Spark Executor 



Instant Visibility at Scale

There are many metrics specific to Spark, and knowing where to start and what to monitor can be difficult. Garbage collection stalls or abnormality in memory patterns can create issues. Performance issues typically arise during shuffles. Latency and throughput of Spark applications impact users.

SignalFx provides out-of-the-box insights across all the Spark metrics that matter. Built-in dashboards for Spark provide a running start to monitoring your complex, distributed environment. SignalFx also curates data from the other applications and cloud services in your environment enabling you to embed your own best practices for monitoring and alerting across services important to your specific use case.


Request Your Spark Monitoring Demo

Ready to learn more? Take a guided tour of SignalFx.