By: Steve Antonoff
Courtesy of App Developer Magazine
Much has been written about the complexity of application performance testing. The breadth and scope required to effectively test application architecture and transaction flow can make it a daunting effort, especially with service-oriented architectures where hundreds or even thousands of third-party services and components are added to the mix.
Nevertheless, while complexity and scope make performance testing challenging, it’s often the response to this condition—or a failure to recognize how it affects testing—that is the root of testing failures.
In some cases, organizational decision makers persist in thinking they can take the same approach to performance testing as they do with functional testing. In others, the desire to control expenses causes firms to cut corners or delay implementing technologies – decisions that set up performance testers to fail. A third complication is the very nature of software development itself, which is evolving at an incredibly rapid pace.
The following are three top “performance testing failures” that I have identified during my work with Orasi customers, specifically, and performance testing in general.
Trying to “Test Everything”
When we help a company develop performance testing requirements and we ask what they want to test, they often say, “Test everything.” This approach is feasible with functional testing, since it addresses a finite number of business events and actions that should perform in essentially the same way, no matter the number of users involved.
However, to make performance testing even feasible, companies must narrow the scope of performance testing to those items that are most important or are most likely to cause (or have a history of causing) system instability. They cannot “test everything.”
The rationale behind this is fairly straightforward, when you consider the architecture of modern software. Performance testing must effectively test the behavior of an architecture as it processes commands and handles a transaction flow that can be incredibly complex.
For today’s supersized applications, it isn’t unusual to serve millions of users, engage in hundreds or even thousands of interactions with internal and/or third-party services, and conduct hundreds of thousands of discrete actions (think processing airline tickets or emptying shopping carts).
To make a performance test satisfactory, it must replicate the transaction flow for an appropriate quantity of users in a sufficient number of scenarios to produce a meaningful result (which requires a different record for each). With functional testing, testers can use the same piece of data over and over with no negative impact on test results. This isn’t true for performance testing.
Consider a “simple” test with 10 records per user. If the test must mimic the activity of 100,000 users, it will require one million records. We worked with one company where a single test required 3.7 million records.
Determining scope can be one of the most frustrating elements of test development. Yet, when testing and development teams give in to stakeholders and try to test too much, they are setting themselves up for failure.
Testing Outside Reality
In addition to adequately addressing the exploding pool of users, activities and datasets, many companies fail to understand that it’s not a good idea to performance test outside of reality. By this, I mean testing when it’s convenient for the company or when the system under test is not in an optimal condition for testing.
Testing production servers is a perfect example. Companies want their teams to run performance tests at night or during other off-hours to avoid negatively impacting server load and slowing down production operations. This is problematic, because off-hours testing frequently runs side-by-side with nighttime processes that result in abnormal performance, such as backups running at the same time as a performance test.
Virtualized servers are also a problem, due to their architecture. Such virtualization features as dynamic load balancing can make it impossible to test live servers effectively. If a test runs on a virtualized server and the load on other servers is light at the time – even if that virtualized server is hit with massive loads—the results won’t be accurate—or predict live performance.
Servers that are not yet in production can be even more prone to problems, if the initial work isn’t rooted in reality. The scripts testers develop with virtualized images of a system (or less desirably, stubs) are only as accurate as the information the teams receive.
I worked with one customer for whom we had developed scripts and test cases based on the information provided, completed the testing, and received the green light to go live. The system immediately crashed. A new process that used significant resources and would run on the server 90% of the time had been added after the tests were developed, and that information had never made it to the development or testing teams.
If a system is in production, testers can and should view the logs (or even better, use log analyzers) to determine what apps and processes are running over a period of time, such as two days. Optimally, they should incorporate that information into the data they provide virtualization tools. If they haven’t yet deployed such tools, they must at least use the data to project the differential between reality and current conditions.
If testers are testing a new system that is not in production, virtualization will become a necessity to achieve meaningful results. Since they are totally dependent upon the customer for information, team leaders must circle back several times and make sure the specs of the production server and its anticipated processes haven’t changed since project inception.
Acceleration of Development Technologies
A third problem that occurs with performance testing – and one for which the answer isn’t quite as simple—is the pace of technology, especially on the development side. New development tools debut and update continually, and progressive developers want (and are often pressured by the competitive environment) to use them right away.
Unfortunately, testing tool—and testers—are always behind. It takes one or more years for companies that create testing tools to update their products to support new protocols, languages, and other technologies. In the meantime, testers still need to test. Testers who are extremely intuitive and inventive can sometimes succeed in this environment.
Otherwise, it’s nearly impossible for a performance tester to conduct reliable testing.
The increasing complexity of development suites is also a challenge. Competition in this arena is fierce, and software firms release updates that support new technologies and offer new feature sets at a blinding pace.
It used to be that testers could work around development suites, and some still can, by working with older protocols like HTTP and HTML. However, even if newer versions of development suites are backwardly compatible with older protocols, success might hinge on custom programming or development of DLL scripts. If these items are outside the tester’s frame of reference, performance testing suffers or can even fail.
Making Room for Performance Testing
These examples are not the only causes of performance testing problems, but they are certainly big contributors. All of them can be improved through better education and stronger collaboration and interaction between stakeholders, business analysts, developers, and testers. Implementation of advanced approaches and technologies, such as test data management and service and network virtualization, are also highly beneficial.
At the end of the day, though, organizations involved in software development must put more emphasis on performance testing. An application can have great functionality, but in today’s market, if the application architecture is flawed or transaction flow bottlenecks aren’t resolved, the resulting product will disappoint. Additionally, testers must be given the tools and training they need to work around the disparities between development and testing—and to provide the most reliable results possible.