Flowmon Studio #13 - What Stuart Smith has learned during 17 years of expertise in APM

October 02, 2017 Flowmon, Infrastructure Management

Some of you may have seen our Flowmon Studio series. Over the years, we’ve become experts in network visibility and security. It appears that becoming experts in video shooting will take more than our current 12 episodes. Recording of an interview with APM expert with 17 years of experience and our latest member of the Flowmon UK team, Stuart Smith, went wrong. But such a small failure was never going to stop me from sharing Stuart’s priceless thoughts, at least the old way - in written form.

I've come across quite a few companies who were sceptical about APM because in their experience, both deployment and operations of such technologies is a nightmare. What is your opinion?

Historically, people have always been scared of APM. They think: “I’ve got to install agents, I’ve got to restart my application, I need to do it out of hours.” It just seems complicated and fiddly and it isn’t. APM doesn’t need to be complex. Many APM solutions require the deployment of agents and hooks into JVM’s or .NET instances. This is not necessary, especially with network approach which is completely passive and works out of the box.

What’s the furthermost common driver for APM purchase? The problem number one?

End users don’t tend to say : “The Database server is so slow today”. They always say that the network is slow today. So the network guys need to defend themselves and often say it’s not the network. So is it the end-station then? The Application server or the back-end? CTO’s need to get a an amber alarm before end users are affected and they need to know what is the root cause.It’s all about mean-time-to-resolution, mean-time-to-innocence. Many APM solutions stop short of actually showing where the issue is, they just show you that ‘It’s not the Application or Server. It must be the network’ but what on the network? Is it latency, is deeper TCP issues like packets dropping, is it a Flood of network traffic? That’s what everybody want’s from APM. Or performance monitoring as a whole.

In your experience, how hard it is to fund such proof of the root cause. Is there a lot of manual “drilling” involved?

The main problem is that vendors often focus on a single area so they will do APM and they might do APM very well but they don't do the network side, they don't have anything to do with network round-trip-time or anything like that. They would just say: “The application is slow today however I don’t know what to do with this.” This requires engineers to jump onto another solution and another one to figure a single problem. Where Flowmon is one solution, one screen where you’ve got NPM and APM and everything all stitched together and everything makes sense in a single view.

What other questions are companies seeking answers for, with regards to the application performance?

It can range from: “Is it affecting everyone? Is it just one location?” To the point where somebody makes a change over the weekend and that affected just users with Chrome or Firefox. So something has obviously changed somewhere and now it’s not working. 80% of application issues are usually caused by incorrect configuration. And that's one of the big things. Is it everybody, is it particular browser, mobile device, is it a geographical thing? That's where you need a suite that has both application and network perspective. And in most of the cases this is enough. Admins don’t want to know about Java methods or code level details. That’s what the devs want.

APM is monitoring of real user transactions. How about there are no transactions to measure - overnight, over weekend?

Real user monitoring is great because you measure the real experience of your users. But if it’s not a 24 hours system, and it doesn’t get used over the weekend. If nobody is using, you don’t know if it’s good or not. So you wait until Monday morning, everybody comes in and uses an application which doesn’t work. So obviously we have the Transaction Generator and we can have that running in the background and creating synthetic transactions. Whether isn’t just accessing the application or logging in and filling some form and doing these complex transactions. That means somebody will be alerted over the weekend, and hopefully make it work before people come back to work. Again, a multilayer monitoring solution is the key to 100% availability.

You've previously worked with other APM vendors. What made you to join Flowmon?

It's a great company, a great group of guys. I've met a lot of people over the phone but this is the first time I've met people face-to-face and I know it sounds cliche but everybody I've met was just really nice. And it's your product so you develop it and you engineer it and your guys know it inside out. So R&D is done very quickly and you don’t have to try to work with different API’s to get your dashboard and it just works.

Check out all episodes of Flowmon Studio series on our YouTube channel.

Artur Kane

Artur was a Progress employee.