Capacity for Tomorrow

In any performance testing project there always comes a time when a suggestion is made to throw systems resources at an issue and hope that it goes away. Often this involves throwing more CPU or servers at a problem based on the assumption that doubling resource will double throughput or halve your response times.


It seems like an ideal way out – after all, hardware is cheaper than its ever been in terms of bang for your buck, so why not scale horizontally and hope for the best? After all, what could possibly go wrong?


We know – at least annecdotally - that adding more hardware to a performance problem is never the quick fix that the project timelines would like it to be, unless the environment is woefully underprovisioned. The key to this situation is being able to say why throwing some more servers at the problem isn’t likely to work as much as we would like.  There 3 principles involved in systems scalability. These are Concurrency, Contention, and Cohesion.


Concurrency refers to either the number of concurrent users or threads offering work against the system, or the number of processors or servers available to service the offered load.


Contention refers to the degradation of a systems performance by a portion of the workload that is serial and cannot be executed in parallel. In the world of performance this can be waiting on a bus transfer from another processor, waiting for some IO to complete or waiting for an RDBMS latch/lock. Another way to think of it is when you are at a restaurant with a group of friends and family. You all have a great meal – you get up to pay – and there is only one till. In this case each of you is waiting on a single resource that you have to visit one at a time to complete the transaction. This is what we mean when we talk about seriality – that part of your systems workload that cannot be executed in parallel.

Contention is a key factor in Amdahls Law, first postulated in 1967 and defined by the equation:




Here, N represents the number of processors and sigma represents the proportion of code that must be executed in serial – i.e. the bit where we need to queue to pay for our meal.

Both concurrency and contention are critical factors in speeding up a process. That is, how much more processing power can you add to a fixed workload and have it complete faster. Typically, this relates to numerical systems performance – number of calculations per second and the like.

The last of the 3C’s is cohesion. This is any work that a processor must do to synchronize shared data such as intermediate communication of results between threads, exchange of data between processor caches, synchronizing shared writable data access and a host of other things that we see in a business systems environment. If contention is having everyone at your table line up to pay their share of the bill then cohesion is that awkward conversation about who is paying for what in the case that someone has had an item off the menu that is massively more expensive than everyone else’s (you know the person, the one who ordered the bottle of Moet while everyone else had coke). No matter how many tills there may be to pay at we still spend time having the conversation and doing the math’s on how much each of us is going to fork out.

Mathematically, the Universal Scalability Law defines this as:


 


Kappa (k) represents the portion of the code that is in cohesion. The interesting thing about the universal scalability law as formulated by Dr. Neil Gunther, is that it does not require you to open the code to perform the analysis. The values for sigma and k are derived from experimentation and observation of the systems performance either under different workloads, or under the same workload but with a different number of configured threads or processors. Also, if the value of k is 0, then the universal scalability law becomes Amdahl’s law.

So, how do these three factors play out? Simply put – the blue line represents a perfectly parallel problem, red represents an Amdahl bound system (contention), and green represents a system that is being constrained by both contention and cohesion:



As you can see, which is not so apparent in the formulae is that when something has a cohesion factor (kappa) we see a drop off in throughput (X) versus the number of CPU’s (N) after a certain point. Incidentally, this is one of the reasons why performance engineers are very reticent about performance testing in a fractional environment (i.e. ½ of production) and multiplying their results out.

So what is it all good for? Firstly – the scalability law gives us a basis for predicting either the maximum number of concurrent users or threads a system can handle given a fixed hardware platform or the optimum hardware configuration for a fixed amount of concurrent users or threads when it is applied against a controlled environment.

Secondly, understanding some of the core principles of computer systems scalability allows us to help articulate to business why adding more hardware to the problem in terms of increasing the number of servers or CPU's available to the system may not provide as much bang for their buck as we would all like. 


Thirdly, and probably most importantly, application of the universal scalability law allows you to predict what the ideal combination of hardware versus load on the system is beyond what you may be able to actively test in the field due to tool or time constraints.  

 

So when it comes to predicting capacity of a system or having that awkward conversation about testing in a 'fractional' environment, we simply can't make a judgement based on a linear interpretation of the results. Sure, modern hardware is powerful and chances are you may be able to get a small improvement by increasing thread counts or hardware capacity, but where is the trade off? Will upgrading a servers CPU from dual quad core to dual six core help that much compared to the costs involved in doing so or, could we take a more measured approach and apply some science to determine where our optimum point is. The Ideal situation is the latter. Scaling horizontally will help but only to a point, and that point won't buy you as much time or throughput as you would like for the costs involved.