There are two types of data scientists out there—theoretical and application data scientists. Which do you need for your analytics project?
Almost every day I come across a company providing learning courses for people to become data scientists. The funny thing is that the time keeps getting shorter: “Become a data scientist in 30 days. No, seven days. No, one day.” And there is a huge craze behind this with hundreds of thousands of people enrolling for these courses. Why not! It is one of the hottest and highest paying jobs in the world, and as per McKinsey, there is going to be a shortage of data scientists to the tune of a hundred thousand by 2020. Plenty of jobs out there for sure.
If the demand is there and all it takes is 30 days to become a data scientist, what is causing this gap? It should be easy to re-skill existing resources and get the work done. So that begs the question, how are data scientists differentiated? What should you as a company be searching for in a data scientist when you are trying to hire one? Does every data scientist have to be a PhD or would the 30-day data scientist work in your case? The structure below should help your cause:
1. Theoretical Data Scientist
Who are they?
The first category of data scientist that I would like to highlight are top of the league, with a PhD or masters in computer science, maths and stats and very core to the subject. They have spent years researching data science algorithms, understand the math well and have contributed successfully in adding to the algorithm library as we know it.
Where are they found the most?
They are really the rare breed of specialist and can be mostly found in analytics product companies or companies that are doing cutting edge work in data science like Google, Facebook and the like. Very research oriented.
When do you need them?
You need a data scientist of this quality when you know your problem is unique and cannot be solved using existing libraries or requires intense research or a completely new approach outside of what is publicly available. Hence, they are mostly found in the companies mentioned above.
2. Application Data Scientist
Who are they?
It is the fastest growing breed of data scientists. This group has a background in coding/computer science and math but have not specialized in data science in their past life. They were either accidentally pushed into this field or jumped onto the bandwagon when they realized that all their peers were doing so too. Their usual starting point is a web course like Coursera or Udacity and then they tend to pick concepts up as they experiment their way through the woods.
Where are they found the most?
BI and analytics services companies and companies where analytics/IT is a non-core operation. Companies that help customers get on the analytics bandwagon. The problems that they solve are medium level. They don’t understand the exact math of the algorithms but would easily be able to tell you its application, its tuning parameters and what it will take to create a model.
When do you need them?
Multiple places. They are great to help build an analytics establishment under a senior theoretical data scientist. They are definitely more affordable and hence can be hired in bulk for experimentation. They can also be used for program managing analytics initiatives, especially those which involve vendors, products etc. Their big value add is that they can get you going fast without the need to get very deep in complexity.
Clearly, not all data scientists are specialists and all applications of data scientists do not require PhDs. Hence, it really boils down to one question—whatever you are trying to solve, is it a complex enough problem to hire a theoretical data scientist to invent the wheel or can an application data scientist experiment his way through the maze and get you the answer?
Abhishek Tandon
Abhishek is a data junkie who lives and breathes solving customer problems using analytics. He has a breadth of experience - from implementing large-scale enterprise data warehouses to helping manufacturers analyze asset behavior and predict failures. Due to his business background, he has a unique ability to understand functional requirements and translate them into technology solutions. He is part of the customer success team and leads solution engineering initiatives, traveling all over the world to explain how Progress DataRPM can help companies save millions of dollars.