SCSI

Monitoring Disk Command Aborts on ESXi: Identifying Storage Overload | Lazy Admin Blog

Posted on Updated on

When your storage subsystem is severely overloaded, it cannot process commands within the acceptable timeframe defined by the Guest Operating System. The result? Disk Command Aborts. For Windows VMs, this usually triggers after 60 seconds of silence from the storage array.

Aborted commands are a critical red flag indicating that your storage hardware is overwhelmed and unable to meet the host’s performance expectations. Monitoring this parameter is essential for proactive datacenter management.

Here is how you can track these aborts using two primary methods: the vSphere Client and esxtop.


💻 Method 1: vSphere Client (Graphical Interface)

This method provides a visual, historical look at command aborts across your infrastructure.

  1. Navigate to Hosts and Clusters.
  2. Select the object you want to monitor (Host or Cluster).
  3. Click on the Monitor tab, then Performance, and select Advanced.
  4. Click Chart Options.
  5. Switch the metric grouping to Disk.
  6. Select Commands aborted from the list of measurements.
  7. Click OK.

🛠️ Method 2: esxtop (Command Line Interface)

For real-time, granular troubleshooting, esxtop is the definitive tool. It monitors the ABRTS/s (Aborts per Second) field, specifically tracking SCSI aborts.

Steps to Configure esxtop for Aborts:

  1. Open Putty and log in to your ESXi host via SSH.
  2. Type esxtop and press Enter.
  3. Type u to switch to the Disk Device view.
  4. Type f to change the field settings.
  5. Type L to select Error stats.
  6. Press Enter, then press W to save these settings for future sessions.

You will now see the ABRTS/s column. This number represents the SCSI commands aborted by the guest VM during the 1-second collection interval.


📈 Thresholds and Interpretation

If you are deploying a monitoring tool, the critical threshold for ABRTS/s is 1. A value of 1 or higher means SCSI commands are actively being aborted by the guest OS because the storage is not responding.

What is Ideal?

In an ideal scenario, ABRTS/s should always be 0.

What is Real-World?

In a busy production environment, you may see this value fluctuate between 0 and 0.xx. This occurs during “peak hours”—for instance, when multiple servers on the host are running disk-intensive backup operations simultaneously, leading to temporary storage saturation. However, any consistent spike above 1 requires immediate investigation into path failures, array congestion, or complete storage unresponsiveness.