In this two-part podcast Rob Steward explains the impact writing quality benchmarks has on application performance, and he offers some tips on what to look for when developing them.
Part 2, which runs 5:04, explains why benchmarks should actually retrieve the data, and why the length of time a benchmark runs is important.
Click on the following link to listen to the podcast: http://dataaccesshandbook.com/media/RobSteward_Benchmarks_2.mp3
Rob Steward
Another thing I see really often with data access code is the code that they put in the benchmark does not match the code that they actually have in their application. Now it’s a confusion because most people don’t understand what’s actually going on underneath that driver – between the driver and the database.
For example, here’s a thing that I see all the time: I get a benchmark, the benchmark executes a statement and then it goes fetch, fetch, fetch, and it fetches every row in the results set, or maybe it doesn’t fetch it at all. But what I see a lot is, I see a fetch, but I never see them actually retrieve the data within the row. Now I’ve never seen a business application – a real world application – that executes a statement, fetches just to move through the rows in the results set, but never actually gets the data out of those rows. Now what happens is – what most people don’t understand – is that underneath when you don’t actually ask for that data, there is some very different things that happen.
Different drivers may be architected or may be coded to work differently. Some may wait for you to say give me column two, give me column three. Some may go ahead and pre-fetch it. And it may depend on some other things that you’re doing; whether you’re say using scrollable cursors or not. It may matter when the data is fetched over to the client or not. So that’s one thing I see often – execute, fetch, fetch, fetch, fetch, fetch, but never actually get the column data. What happens is, the benchmark performs one way, but the real world application performs in a very different way when they roll it out.
I’ll talk about another really common thing I see. A lot of people when measuring their benchmark, they’ll measure time it takes to do some particular operation. The problem with that is this; let’s say you’re measuring whether you can execute a single statement and fetch the results, and it may take 100 milliseconds with which to do that. The problem is there are a lot of variables on a single execution that can cause your times to vary very widely. In the example that I just gave you, you may execute that statement once and it may take you five seconds the first time you do it. The second time you do it, it may take 100 milliseconds. Now there is a really big difference if I’m trying to predict what that application is going to look like between that particular statement taking 5 seconds verses 100 milliseconds.
I talk in detail in the book about why that’s true, and why there is variability, so I’m not going to go into that here. But what I will say is what you want to do is take a fixed period of time and see how many of that operation you can do. So for example, instead of start the clock, execute the statement, fetch the result, stop the clock. What you should do is start the clock, and say I’m going to run for 5 minutes, and then execute and fetch as many times as you can within that five minutes. And what that gives you is a good predictor of what the average person is going to see as they run your application. So do it over a fixed period of time. And that fixed period of time needs to be long enough to deal with some of the things that I talk about in the book as far as warm-up times with just-in-time compilers in Java or .NET, or system clocks on machines that are not accurate to the 100ths of milliseconds, or those kinds of things.
I go through a lot of reasons in the book why you want to do this, but the take home for here that I would like for people to know is you don’t want to measure how long a very specific operation takes. You want to say, ‘how many times can I do that operation over a specific period of time?’ And typically when I write my benchmarks, five minutes or ten minutes is the amount of time that I use to measure those things, because that typically takes a lot of the variability out of what you’re going to see in your benchmarks.
Again, there’s a lot of information and a lot of good tips within the book on writing good benchmarks. But if you’re not modeling what you do in the real world, then you need to make your benchmark much closer to that environment.