Server Fault Asked on November 4, 2021
Traffic to our service is not entirely predictable. To help keep the service slightly over-provisioned and to provide advance warning of any degradation resulting from an increase in traffic, we maintain a kind of "continuous buffer load generator". This generates a constant load against our production API, on top of user traffic. If we find that service is degrading, it is automatically turned off, and ideally we have a bit of time to figure out the issue and scale up before natural user traffic matches the augmented traffic that had led to service degradation. The buffer load is turned back on once the service is stable again.
While we’ve been calling this continuous traffic generation a "continuous load test", this seems like confusing wording and makes it hard to disambiguate from "actual" load tests (which is what I’ll call experiments with a defined beginning and end, load pattern, and a binary pass/fail result at the end). I almost want to call it "canary traffic", since we’re sending in additional traffic to warn us of issues before our users encounter it, but it doesn’t line up well with the general understanding of the meaning of canary in this industry.
This is an additional strategy on top of load balancing, autoscaling, etc. We’re not trying to replace any industry-standard traffic management steps here.
I suspect this is a case of not knowing the right words to Google, so:
Synthetic or active monitoring is a term for artificial load that simulates what the application actually does. In the context of measuring application performance.
Simulating your actual load is fantastic. However, consuming a sizable fraction of your resources in production is not efficient, it consumes resources. More importantly, the automatic disable mechanism becomes critical to maintaining good performance. Instead, throttle back to a minimum level at all times, and continue measuring response times and error rates. Never stop measuring, as degradation will show user impact of events.
Realistic load generators are good for testing and capacity planning. Provision a different compute instance size in a test environment, and push the load until it falls over. As a part of a high availability test or rolling upgrade, add some load temporarily to validate an otherwise idle system.
Decide what the response time objective is. Learn however many requests per second is safe. Set auto scaling or alerts at actionable thresholds.
Measuring service level objectives, plus knowing the limits, will give you tools to do proper capacity planning. Without burning your buffer capacity artificially.
Answered by John Mahowald on November 4, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP