In this two-part podcast Rob Steward explains the impact of benchmarks on application performance, and he offers some tips on what to look for when developing them.
Part 1, which runs 5:32, focuses on the importance of emulating the number of users when developing benchmarks. Part 2 provides tips on why benchmarks should retrieve data and how long to run benchmarks.
Click on the following link to listen to the podcast: http://dataaccesshandbook.com/media/RobSteward_Benchmarks_1.mp3
Rob Steward
You’re absolutely right, I’ve gotten a lot of good feedback on that chapter of the book, and the reason why is it’s something everybody does. We all know as computer scientists that as we start to develop and model our applications, what’s going to happen to us when we roll out: We can develop the application, we can make sure that it works, but what we typically don’t know is how well is it going to perform in our production environment? So most people will spend some effort to determine and try to model, ‘what is this application really going to behave like when I roll it out?’
You know the typical application – we get it working, and it’s hard enough to get it working right – but you don’t typically get exceptions or errors from your code that says hey, ‘I’m running too slow.’ Or, ‘hey, I’m going to run too slow when I get to a 1,000 users, or when I get to 500 users, or when I get to 100 users, or whatever it may be.’ So a developer may develop that code, it may work correctly and it may work efficiently, and fast enough for that developer on his machine. But we’ve all seen it in our careers – over and over – when you roll something out, and all of a sudden the whole thing comes to a screeching halt because there is some component or some piece within it that is not scaling or not performing well. A lot of people try to predict that by writing benchmarks.
In my line of work, dealing with data access code, I’ve seen a whole lot of benchmarks. So what will happen is somebody will write a benchmark, they’ll look at, ‘this ODBC driver verses that ODBC driver, or this JDBC verses this other JDBC, or whatever it may be.’ They’ll try to benchmark their data access code, and then they’ll send it to me, and I’ll look at it, and I’ll tell you that 95% of what I’ve seen are what I’ll call ‘bad benchmarks’. In a nutshell, what a bad benchmark is is a benchmark that does not accurately reflect what my application does or is going to do in a production environment.
Because again, the goal of any benchmarking exercise is to predict either one, what is my application going to look like when I roll it out? Or a second purpose for benchmarks is – if I make some change to my application, how is that going to look in a production environment? So with that goal in mind, we’ve got to predict what our performance and scalability is going to be like in production. That being the single goal of an application benchmark, there are a lot of things that people don’t understand that make their benchmarks not accurately reflect what’s going to happen in their production environments.
For example, you’ve got to emulate the number of users. So if you’re going to have 100 users in an application, if your benchmark only tests it with a single user, you’re asking for trouble. Most performance problems are actually what I’ll call ‘scalability problems’. And scalability means as I add users or I add load to the system, does it scale the way I think it should?
For example, if I have an 8 CPU application server machine, and my data access code is sitting on that application server and access some Oracle server that’s on a different machine, as I start to have more and more users in there – particularly if I have one user who’s accessing it – and then I add a second user, does each user get the same response time? Does each user get the same throughput? As I add a third user, and as I add a fourth user, we all make the assumption that because that hardware’s got multiple CPUs, and I’ve multithreaded my application, that everybody’s going to see the same throughput. Or at least it’s going to be linear in some fashion. As I add 12 or 20 users to it, does the curve of my throughput of all those users, does it linearly scale? Or does all of a sudden, does everybody start to get slower and slower as I add more users? That’s the thing that happens most often.
Now, as I’m doing my benchmarks I’ve got to make sure, well I’m going to have 100 users, and then I need to actually stress that with 100 users. Now there’s a lot of software on the market, a lot of different test software that will emulate those multiple users. Or a lot of times when I write my own benchmarks, what I do is I spawn multiple threads, each one of them doing some emulation of the user. So that’s probably the number one thing that I see.
Subscribe to get all the news, info and tutorials you need to build better business apps and sites