Case Studies | ConserveMode™

Validation Study

H200 GPU Idle State Validation — RunPod Platform

Platform: RunPod · Hardware: NVIDIA H200 · Regions: 31 global · Measurement: nvidia-smi · Interval: 1 second

Baseline (Hot-Idle)

600–700W

ConserveMode™ Dormant

73.65W avg

Layer 1 Reduction

89%

Important context: The 89% figure applies to idle/dormant intervals only — not the full duty cycle average. At 50% GPU utilization (typical enterprise), approximately half of all GPU-hours are idle. Layer 1 captures all of that idle waste.

V2.0 Formula

Inference Layer — Semantic Cache Validation

Mixed enterprise workload profile · Exact + semantic cache combined hit rate: 66.6%

Request Type	% of Requests	GPU Power	Contribution
Exact cache hit	51.4%	0W	0W
Semantic cache hit	15.2%	0W	0W
Cache miss → 7B model	26.0%	200W	52W avg
Cache miss → 13B model	5.0%	350W	18W avg
Cache miss → 70B model	2.3%	700W	16W avg
Weighted Average	100%	86W	87% vs 650W baseline

Summary

Three-Layer Verified Performance

Layer	Mechanism	Savings	Measurement Basis
1 — Idle States	Predictive power scheduling	89%	700W → 75W · H200 validated
2 — Inference	Semantic cache + ML routing	87%	650W → 86W · V2.0 formula
3 — Total Facility	Combined IT + cooling effect	~51%	80% cache hit · PUE 1.6

Validated Results

H200 GPU Idle State Validation — RunPod Platform

Inference Layer — Semantic Cache Validation

Three-Layer Verified Performance