Tremhost Labs Report: A Longitudinal Study of Cloud Performance Variability

For most organizations, public cloud infrastructure is treated as a stable, consistent utility. This Tremhost Labs report challenges that assumption through a six-month longitudinal study designed to measure and quantify the real-world performance variability of compute, storage, and networking on the three leading cloud platforms.

Our findings reveal that even on identically provisioned virtual machine instances, performance is far from static. Over the study period (January-June 2025), we observed significant performance fluctuations, with 99th percentile network latency spiking to over 5x the average, and storage IOPS periodically dropping by as much as 40% below their provisioned targets.

The key takeaway for technical decision-makers is that “on-demand” infrastructure does not mean “on-demand identical performance.” The inherent nature of these multi-tenant environments guarantees a degree of variability. For production systems, architecting for this instability with robust, application-level resilience and monitoring is not optional—it is a fundamental requirement for building reliable services.

 

Background

 

As of mid-2025, businesses rely on public cloud providers for everything from simple websites to mission-critical applications. However, the abstractions of the cloud can mask the complex, shared physical hardware that underpins it. This study aims to pull back the curtain on that abstraction.

By continuously measuring performance over a long period, we can move beyond simple “snapshot” benchmarks. This provides a more realistic picture of the performance an application will actually experience over its lifecycle. This is particularly critical for businesses in regions like Zimbabwe, where application performance and user experience can be highly sensitive to underlying infrastructure stability and network jitter.

 

Methodology

 

This study was designed to be objective and reproducible, tracking key performance indicators over an extended period.

  • Study Duration: January 1, 2025 – June 30, 2025.
  • Test Subjects: A standard, general-purpose virtual machine instance was provisioned from each of the three major cloud providers (AWS, Azure, GCP) in their respective South Africa data center regions.
    • Instance Class: 4 vCPU, 16 GB RAM, 256 GB General Purpose SSD Storage.
  • Test Platform & Control: A Tremhost server located in Harare, Zimbabwe, acted as the central control node. It initiated all tests and collected the telemetry, providing a consistent, real-world measurement point for network performance from a key regional business hub.
  • Automated Benchmarks:
    1. Network Latency & Jitter: Every 15 minutes, a script ran a series of 100 ping requests to each cloud instance to measure round-trip time (RTT) and its standard deviation (jitter).
    2. Storage I/O Performance: Twice daily (once at peak and once off-peak), a standardized fio benchmark was executed on each instance’s SSD volume to measure random read/write IOPS.
    3. CPU Consistency: Once daily, the sysbench CPU benchmark ran for 5 minutes to detect any significant deviations in computational speed, which could indicate resource contention (CPU steal).

 

Results

 

The six months of data revealed significant variability, particularly in networking and storage.

 

Network Latency (Harare to South Africa)

 

While the average latency was stable, the outlier events were significant.

Cloud Provider Average RTT p95 RTT p99 RTT Key Observation
GCP 22 ms 35 ms 115 ms Consistently lowest average, but subject to occasional large spikes.
AWS 25 ms 48 ms 130 ms Higher average and more frequent moderate spikes than GCP.
Azure 28 ms 55 ms 145 ms Highest average latency and most frequent outlier events.

The crucial finding is that for all providers, the 99th percentile latency—the “worst case” 1% of the time—was 5 to 6 times higher than the average.

 

Storage I/O Performance

 

The benchmark measured the performance of a general-purpose SSD volume provisioned for a target of 3000 IOPS.

Cloud Provider Provisioned IOPS Avg. Observed IOPS Min. Observed IOPS Key Observation
All Providers 3000 ~2950 ~1800 Performance periodically dropped to ~60% of the provisioned target.

The data showed that while the average performance was close to the advertised target, all three providers exhibited periods where actual IOPS dropped significantly. These dips typically lasted for several minutes and often occurred during peak business hours in the cloud region.

 

CPU Performance

 

CPU performance was the most stable metric. Across all providers, the daily sysbench score varied by less than 2%, indicating that CPU “noisy neighbor” effects, while technically present, were not a significant source of performance variation for this class of instance.

 

Analysis: The “Why” Behind the Variability

 

The observed fluctuations are not bugs; they are inherent properties of large-scale, multi-tenant cloud environments.

  • The “Noisy Neighbor” Effect: This is the primary cause of I/O variability. Your virtual machine’s SSD shares a physical backplane and controller with other customers’ VMs. If a “neighbor” on the same physical host initiates a massive, I/O-intensive operation, it can create contention and temporarily reduce the resources available to your instance. This is the root cause of the periodic IOPS drops.
  • Network Path Dynamics: The internet is not a single, static wire. The path between Harare and Johannesburg can be re-routed by ISPs or within the cloud provider’s own backbone to handle congestion or link failures. These re-routes can cause transient latency spikes. The p99 spikes observed are a direct measurement of this real-world network behavior.
  • Throttling and Burst Credits: Cloud providers manage storage performance with credit-based bursting systems. While your instance may be provisioned for 3000 IOPS, this often comes with a “burst balance.” If your application has a period of very high I/O, it can exhaust its credits, at which point the provider will throttle its performance down to a lower, baseline level until the credits replenish.

 

Actionable Insights & Architectural Implications

 

  1. Architect for the P99, Not the Average: Do not design your systems based on average latency or IOPS figures. Your application’s stability is determined by how it handles the “worst case” scenarios. Implement aggressive timeouts, automatic retries with exponential backoff, and circuit breakers in your application code to survive these inevitable performance dips.
  2. Application-Level Monitoring is Essential: Your cloud provider’s dashboard will show that their service is “healthy.” It will not show you the 120ms latency spike that caused your user’s transaction to fail. The only way to see what your application is truly experiencing is to implement your own detailed, application-level performance monitoring.
  3. Embrace Resilient, Frugal Design: For businesses where performance directly impacts revenue, this study underscores the need for resilient architecture. This means building systems that can degrade gracefully. For example, if a database connection is slow, can the application serve cached or partial content instead of failing completely? This approach to “frugal resilience”—anticipating and mitigating inherent cloud instability—is a hallmark of mature cloud engineering.

Hot this week

How a Professional Website Can Double Your Business Leads in 90 Days

Why a Professional Website Matters A business website isn’t just...

How to Move Your School Online with Tremhost’s Education Plans

Learn how Zimbabwean schools can transition online with Tremhost’s...

How School Websites Help Increase Enrollments in Zimbabwe

Learn how having a professional school website in Zimbabwe...

Why Your Business Needs a Website in 2025 – Especially in Zimbabwe

Discover why Zimbabwean businesses need a professional website in...

How to Choose the Right Web Design Company in Zimbabwe

Why Choosing the Right Web Design Company Matters Your website...

Topics

How a Professional Website Can Double Your Business Leads in 90 Days

Why a Professional Website Matters A business website isn’t just...

How to Move Your School Online with Tremhost’s Education Plans

Learn how Zimbabwean schools can transition online with Tremhost’s...

How School Websites Help Increase Enrollments in Zimbabwe

Learn how having a professional school website in Zimbabwe...

Why Your Business Needs a Website in 2025 – Especially in Zimbabwe

Discover why Zimbabwean businesses need a professional website in...

How to Choose the Right Web Design Company in Zimbabwe

Why Choosing the Right Web Design Company Matters Your website...

Websites Built to Grow Your Business in Zimbabwe

Discover how Tremhost builds custom websites that grow businesses...

How to Make Your School Visible on Google in Zimbabwe

Learn how Zimbabwean schools can rank higher on Google...

Why Professional School Emails Build Parent Trust

Discover how professional school email addresses boost parent trust...
spot_img

Related Articles

Popular Categories

spot_imgspot_img