GPU Utilization Metrics

Key Desiderata for a GPU Utilization Statistic¶

An ideal metric for GPU utilization should reward high average usage, penalize low/idle time, and account for stability (low variance).

High mean utilization is good.
Low variance (stable, not spiky) is good.
Duration/time-normalization is important—long idle periods should penalize the score.
Interpretable on [0, 1] or [0%, 100%] scale if possible.

Let \(u_1, u_2, ..., u_n\) be the sequence of GPU utilization percentages (sampled at regular intervals, in [0, 100]).

\[ \mu_u = \frac{1}{n} \sum_{i=1}^n u_i \]

\[ \sigma_u = \sqrt{ \frac{1}{n} \sum_{i=1}^n (u_i - \mu_u)^2 } \]

\[ \text{EffU} = \mu_u - \lambda \sigma_u \]

\[ \text{Frac}_{\theta} = \frac{1}{n} \sum_{i=1}^n \mathbf{1}\{ u_i > \theta \} \]

\[ \text{AUC}_u = \frac{1}{100 n} \sum_{i=1}^n u_i \]

Same as mean, but normalized to [0,1].
AUC is also robust if your sampling interval is uniform.
AUC only reflects “average work done,” not how the work was distributed in time. For hardware optimization and system diagnosis, you also want to know if the workload is steady or bursty, and how often the GPU is left waiting.

Let’s define a simple composite metric:

\[ \text{GPU Efficiency} = \frac{\text{Mean Util} - \sigma_u}{100} \]

Or:

\[ \text{GPU Utilization Score} = \frac{1}{100} \left( \alpha \cdot \mu_u + (1 - \alpha) \cdot \text{Frac}_{\theta} \right) \]

Where α is a weight (e.g., 0.5), θ is a high-utilization threshold (e.g., 80%).

If intervals are not uniform, multiply each utilization by its interval and divide by total time:

\[ \text{TimeWeightedMean} = \frac{ \sum_{i=1}^n u_i \Delta t_i }{ \sum_{i=1}^n \Delta t_i } \]

High mean but high variance may indicate batchiness, pipeline stalling—lower score with the above metric.
High mean with low std is truly optimal (score near 1).
Low mean and low std means consistently idle (score near 0 or negative).
Composite metrics can be tuned (λ or α) to emphasize stability or average, depending on workload.