The Data Access Handbook explains how virtualization is significant for database computing. In this podcast Rob Steward explains the implications virtualization has on the performance of database applications. The podcast runs for 6:30.
Click on the following link to listen to the podcast: http://blogs.datadirect.com/media/RobSteward_June4_Virtualization.mp3
Rob Steward:
As you said, virtualization is a trend that’s been going on for several years now, but particularly over the last two to three years we’ve seen quite an increase in customers virtualizing their environments; obviously this is driven by a cost control. As you know, virtualization is when people take a lot of machines and they can combine them onto one piece of hardware. So they have virtual operating systems running with essentially virtual machines. And the question has been asked of me a number of times, ‘how does this affect the overall performance of my database-centric applications?’
We’ve done a lot of benchmarking, done a lot of testing, and we find that it actually has a significant effect. You kind of have to look at why people want to do virtualization in the first place to understand why it has an effect. Really the promise of virtualization is to fully utilize your machines. So you see numbers from Gartner or IDC or a number of the large analyst firms that say the typical server today is only utilized 10-15-20% of the time. So what that means is 85-90% of the time that machine is doing nothing but warming the room that it’s in. It’s not actually doing any useful work. So the whole idea of virtualization is, ‘hey, let’s take all those spare computing cycles that are sitting there that we pay good money for, and let’s actually put them into production.’ Let’s use them in a way that we can actually get the return on our investment for that hardware. Of course, the idea of virtualization again is, ‘okay, instead of just running one Windows machine, let’s run two or three or four or 20, or I’ve talked to people who have actually up to 40 images running on a single server; running 40 machines on a single piece of hardware. And the idea there is if one is not being used at the moment, maybe one of the other ones is so that utilization may jump from 10% or 15% or 20% up to 80% or 90%.
It’s important to understand the impact of virtualization on your data access. If you think about what we talk about with data access for any application, pulling data from the database, it’s one of the more CPU and memory intensive operations that you have in your application. If you pull a large result set of 10,000, 100,000, 1 million rows, and you’re trying to hold the memory, you’re talking about using massive amounts of memory, disk space, and CPUs.
If you think about if that application or that server is only utilized 10% or 15% of the time, and you have some components of your stack or some way you’ve written your code that is inefficiently using CPU or memory. It doesn’t matter so much when 80% of the time the CPU is not being used anyway, because you then have all these spare cycles which you can account for those bad algorithms, or that bad code, or that bad JDBC driver, or whatever it is that you have. When you get into a virtualization environment and you’re now pushing the limits of that hardware again, now you say, ‘well an inefficient ODBC driver, JDBC driver, or an inefficient piece of data access code – you know I wrote this algorithm and it worked fine, it used a lot of CPU but it worked,’ all of a sudden becomes a bottleneck for you. You’re scalability starts to decrease rapidly because you’re up against those hardware limits.
Remember, again, the promise of virtualization is in fact full utilization of your hardware. So what we’ve seen many times over the last couple of years is, I’ll have customers come to me and say, ‘I have this application, it connects to Oracle, and it is running fine with my 100 users. Now we virtualized our environment – we’ve put that machine onto a virtual machine with three other ones – and all of a sudden this application is not performing well at all.’ And what it turns out to be is either the data access code that they’ve written – the Hibernate or the .NET code or the ODBC code or whatever it is – is either not written efficiently, or they have some piece of middleware – some driver – that is written very inefficiently and uses too much CPU or too much memory. And all of a sudden, because they went to the virtualized environment, that excessive use of memory and CPU and disk and network becomes a big issue. Quite frankly I’ve sold a lot of software over the last couple of years because of that issue, because we tend to design ours to be more memory and CPU efficient.
To wrap it up is to say that those issues that you had 10 or 15 years ago when the hardware was slower and you had to write code that was better – more efficient – that sort of went away because the hardware became so much cheaper. Now that we’re moving into virtualization, we’re kind of digging that same problem back up again. Now we’re digging it up to say, ‘well, those issues about our code’s got to be better, our algorithms got to be better, every component in the stack has to be better in order to really see the fruits of doing the virtualization effort.’ Can I put four virtual machines on that piece of hardware? Or can I only put two? So you start to see where it costs you money directly if you don’t have those efficient coding techniques, or the efficient middleware that you need.
In The Data Access Handbook we talk a little bit about virtualization. There is a section in there that goes into some of what I just talked about. Not a huge amount of detail in there, but the idea is, for virtualization, all the things that we talk about in The Data Access Handbook become that much more important. When you talk about reducing the amount of network and CPU and memory that you use, virtualization really just intensifies the need to follow the good practices that we talk about in The Data Access Handbook.