The Implications of Excellent Coding: Part 2 – Real world examples

June 02, 2009 Data & AI

In this two-part podcast Rob Steward explains the difference between good and bad coding, and how good coding can benefit your ODBC, JDBC, or .NET applications.

Part 2, which runs 7:05, provides real-world examples of the impact that good coding can make.

Click on the following link to listen to the podcast: http://dataaccesshandbook.com/media/RobSteward_GoodCode_2.mp3

Rob Steward:

I’ll give you one other example that I like to use when I’m talking on performance, because it’s a pretty dramatic example. So I’ve been giving talks at conferences and seminars over the last 10 to 12 years now. And one of the tips that I’ve given out for a number of years now is controlling how you do transactions. So most people don’t realize that within all of those standards based APIs that I’ve talked about that you’re automatically in auto commit mode by default. What that means is, when I execute a statement, no matter how I execute it, it’s automatically committed. Now most databases don’t actually have an auto commit mode, but those standards based APIs require that by default, that’s the way you are. So I execute my insert into table, and it’s automatically committed.

Now, as apposed to manual commit mode, which is probably what most people think about when they think about transactions, where you begin a transaction, you do an insert or maybe multiple inserts or updates, and then actually issue the commit. All of these standards based APIs in order to make it easier on the programmers automatically commit everything for you. This is a good thing in most cases, but in some cases that’s not what you want to happen. Obviously if you want to tie multiple inserts and updates within a single unit of work or within a single transaction, then you want to manually control it. You want to say start the transaction, issue the statement, and then issue the commit. And I think most people understand that.

But the thing that I’ve seen over and over is that you may want to do thousands of inserts or thousands of updates, but most people don’t understand the cost of auto commit mode. So if we think about it, and think it through, what happens in the driver because of auto commit? If you’re in a database that doesn’t support an auto commit mode, which is most of them, most of them do not have a little switch that says ‘just automatically commit.’ Most of them require that what the driver does then in order to successfully do an auto commit is you execute a statement, what really happens underneath in the driver is it sends the statement across and it also sends a commit. So what that means is that for every statement that you’re executing, there’s actually two that are going across the wire. There’s your statement, and then there is a commit that goes behind it, even though you don’t see that. It’s required that the driver do that underneath you.

One of the things that a good driver or a good application will try to do is avoid network I/O. So now if we say we’re in auto commit and for every insert that we do we incur two network roundtrips. But not only that, one of the most expensive things that we do on a database is commit. So what happens on a database when we do a commit, is its got all of your changes in memory and when you do the commit, when it responds with a success, what it’s doing is guaranteeing that those changes will actually be persisted regardless of what happens next. So once you’ve got a success back from a commit or a rollback, the database is guaranteeing that no matter what happens, even if the power goes out or the machine goes down, that change is persistent. So what it has to do is, when you do the commit, is it has to flush some things out the disk. Not only does it flush them out the disk, but it tends to be non-sequential disk I/O.

So we all know that reading and writing from disk or to disk is one of the most expensive things that you can do, so we try to avoid that. When you do a commit you’re going to write to the disk. But not only that you’re going to write, typically because you’re updating the indexes and other things involved in a table, you’re going to write to very different places on the disk. So it’s non-sequential disk I/O.

The tip that we talk about in the book, and the thing that I wanted to point out on this podcast, is that if you don’t need to be in auto commit mode, then you can save significant amounts of time by switching into manual commit mode. And we go into the book exactly the code you write to do that. It’s also covered in all of the specifications for all of those APIs. But essentially, instead of going insert, insert, insert – you know, I’ve issued my four or five or 1,000 inserts or whatever it is – what I do is start transaction, move into a manual transaction control, issue all those inserts, and then do a commit.

Now I’ve given out this tip out for a number of years now. A couple of years ago I had this guy come up to me after a seminar, and he said, “Rob, I heard you say this tip a year ago, and I was doing this operation that actually involved doing this operation which actually involved doing a 5 million row insert into a DB2 table.” And he said, “It would take us 8 hours to do the insert. I went back after hearing you give this one tip on switching out of auto commit mode, changed my code, took me five minutes to do, I reran it, and it took 10 minutes.”

Now that’s an extremely good example, but most people aren’t doing 5 million row inserts all the time. But it does show you the kind of difference that one little coding change can make. They were doing the 5 million row inserts, they went from taking 8 hours to taking 10 minutes just by implementing this one tip that we cover in the book. That’s the kind of difference that good code and bad code can make. Both of those things work, but 8 hours verses 10 minutes is definitely the success or failure of that product or that project.

Those are the kinds of things that we talk about. Again, I would encourage you to read the book. We go into a lot of these things and explain why they’re true, and give you very specific code examples as to what kind of things are good verses bad.

Rob Steward