You’re merging onto the highway, you can see from the on-ramp the steady flow of cars at 65 mph (105 kph), you hit the gas to speed up with traffic, and oh <expletive> – you are all of a sudden stuck at 35 mph (56 kph) – maybe restricted exhaust, who knows, and it really doesn’t matter at that point … what are you thinking and feeling? You kind of expected to have the performance to accelerate to 65 mph when you merge onto the highway. How do you think the people behind you on the on-ramp and those barreling down the highway feel? They also likely expected you to have the performance when you needed it.
Application performance is a cornerstone for managing customer experience and satisfaction. In a highly competitive environment with high expectations, it can mean the difference between a highly satisfied customer and churn. And, in a complex dynamic environment, performance requires orchestration across a broad set of components in order to deliver performance when you need it. It is useful to understand what characterizes typical performance analysis since it drives an understanding of your application, and more broadly your solution.
At its most basic level, performance is characterized by some input, called load, and resulting responses measured in terms of throughput and response time. From the on-ramp analogy, you can think of “hitting the gas pedal” as the load, the resulting speed as the throughput, and the time that it takes between pressing the pedal and getting to your desired speed as the response time. In the telecommunications space, the input load could be diameter credit control requests, subscriber management queries, recurring processing, etc. How many requests can be satisfied per second (throughput) and how fast the answers get back to the requester (response time) are especially crucial in telecommunications because one is dealing with human interaction and perception – someone is trying to place a phone call, or check their email, or join a sharing group. And that “someone” is just one of potentially tens of millions of subscribers doing the same type of actions on the network.
It is most effective to apply performance analysis both holistically and targeting components. At MATRIXX, we use what could be viewed as a decomposition approach. There is the overall solution that needs to perform, and it is composed of a set of systems which run a set of applications which are built on a set of algorithms/methods. If one of those fails to deliver or becomes a bottleneck, it could put a drag on the entire solution – like trying to merge onto the highway at 35 mph (56 kph). So, it is important that your attention to performance spans the components over which you have oversight. In our development, we have built component level performance tests that operate at the individual method level up to system wide benchmarks that exercise the breadth of our solution. There is no magic formula, but insight is derived by our continual refining performance analysis.
Let’s talk a little bit more about bottlenecks. Bottlenecks limit the ability to process input load due to resource contention, e.g., an application has consumed all available CPU, a hard drive is spending more seek time trying to find the next data record than processing, etc. One of the more common approaches to rooting out bottlenecks is to use benchmarks. Benchmarks are just a form of load. Besides the target input, like calls per second, they usually are structured with some base configuration of the product on a given set of hardware. The configuration would include the initial conditions of the data, like some number of subscribers in a database. At MATRIXX, we maintain internal benchmarks and work with customers to integrate realistic scenarios. These benchmarks provide a point of reference for evaluating performance as throughput and response times are the standard output.
Once you’ve done the tuning — and potentially re-factoring — to squeeze out the optimal performance from an application running on a given system, what do you do next? For example you are looking for your system to go from 10 million to 20 million subscribers with the corresponding doubling of the throughput demand. However, you’re currently running with most of the system resources efficiently utilized for 10 million subscribers. This is the point when scaling comes into play. Scaling is the addition of more resources, e.g. CPU, disk, and/or network bandwidth, to improve performance. And, scalability is the property of a system to utilize more resources. Scaling can be applied vertically, which is increasing the resources within a server, or horizontally, which is increasing the number of servers. In our example of moving from 10 million to 20 million subscribers, we could either double the CPU, memory, and disk of the type of servers processing the 10 million subscribers (vertical scaling) or add a second set of the same servers (horizontal scaling). While it is often preferable to scale vertically since it limits the overall number of servers, there is a trade-off between the increasing cost of large scale servers compared to commodity server. So multiplying the number of servers (horizontally) becomes a cost efficient approach when the target subscriber count and throughput grow beyond the capacity of vertically scaling.
As a closing note, please keep in mind that the pursuit of performance cannot be unbridled. One can arbitrarily achieve fast performance if functionality and robustness are disregarded. But that is a topic for a future blog.