What is Amazon EBS?

Amazon’s Elastic Block Store (EBS) is used to provide disk volumes for Elastic Compute Cloud (EC2) instances. It’s elastic because it allows users to quickly create volumes based on their needed size and performance. EBS also offers greater durability by automatically replicating within an availability zone, making it easy to setup encryption and create snapshots.

Amazon EBS Monitoring

Since disk volumes are a foundational resource for operating systems and their applications, it’s important to monitor them to ensure your application’s availability. If your disk has hit a performance limit, it can slow down your application. If the disk is full, it may even crash the server.

SignalFx offers pre-built content straight out of the box so that you can easily get insight into factors like read latency and throughput rate without having to set up your own trend graphs.


Amazon EBS Logo

Volumes: There is a 500% difference in cost per GB between the least expensive HDD and the most expensive SSD, so monitoring relative to cost is essential. Monitor IOPS and throughput during peak usage or stress test to determine if they are critical factors in your app performance.

Total Operations: Seeing the total I/O operations and throughput across your entire cluster gives you an idea of how much capacity you’re using in aggregate. Large spikes might indicate a system-wide activity pattern, like a sudden increase in traffic to your website.

Read Operations: Looking at the number of bytes per read operation provides insight into how many volumes are experiencing small reads versus large reads. Small reads are better served by an SSD, which has more IOPS capacity. HDDs are a better fit for large reads because of their higher 1k maximum IO size (or block size).

Write Operations: Determine how many servers are experiencing a high number of write operations through the percentile distribution. You can see the number of bytes per operation, which could help you provision the best type of volume.

Top Queue Length: If your application has a high number of IOPS, you might be able to decrease the latency by keeping a low queue length. A higher queue length allows you to achieve higher throughput due to sequential reads or writes but at potentially a lower latency.

Read & Write Latency: End-to-end time to complete an I/O operation is reported as a percentile distribution, including statistics like average and maximum times. If latency is higher than expected, IOPS or throughput may have hit a limit or queue length may not be optimized for volume type.

The SignalFx Difference

Per Volume Metrics: SignalFx also provides access to metrics on a per-volume basis, which are useful when drilling down to troubleshoot issues with a specific volume. The per-volume dashboard shows operations per minute, throughput, and latency. The charts give an operational view of both read vs. writes, as well as percentage change for each relative to 24 hours ago.

Volume Size & Utilization:  Amazon CloudWatch does not offer metrics about volume size or percent utilization. It’s important to monitor disk size because, if your server runs out of disk space, it can become unstable or crash. The SignalFx plugin for collectd helps monitor disk usage and compute additional metadata, making it easier to analyze.

Intelligent Alerting: Easily create custom calculations for EBS automatically in the SignalFx interface. For example, you can create a dynamic threshold based on a percentile by applying analytics like time shift and mean with just a few simple clicks. An alert detector that notifies you when disk space decreases 30% overnight is a more meaningful signal that will result in fewer false-positives than a basic static threshold.

Amazon EBS Metrics

Read Bytes
Write Bytes
Read Ops
Write Ops
Volume Total
Read Time
Volume Total
Write Time
Idle Time
Queue Length
Volume Throughput     
Volume Consumed       
Read Write Ops

Start Your Amazon EBS Monitoring Trial

Try SignalFx for 14 days. No credit card required.