Data science is extremely valuable for almost every business and is rapidly becoming the primary catalyst for product innovation. By finding insights, making accurate predictions, and adapting on the fly, data driven organizations are creating a massive advantage over their competitors. Assuming you are interested in implementing data science successfully within your organization, there are a few things to consider.
Your First Hire
Generally I tell people to avoid attempting to hire the mythical data science unicorn. This creature — who can manage the engineering process, lead modeling efforts, coordinate the product roadmap, and articulate results to stakeholders and leadership — has been reportedly seen roaming the campuses of companies like Google, AirBnB, and Amazon but in reality, is pretty rare. While you might get lucky, what you will eventually realize is that data science is actually comprised of multiple disciplines and anyone who can lead all phases is in very high demand.
Assuming a unicorn doesn’t land in your lap, the primary skill sets to focus on initially are leadership, communication, and data engineering. Most scale up will need to focus on storing data properly, sampling at appropriate intervals, and building indicators that will go into the model before doing any advanced research. In order to be able to architect the framework your first hire will definitely need to have a strong machine learning background. They just might not be a PhD level researcher.
As your team grows you will need to start thinking about team structure. Generally machine learning teams are comprised of scientists, engineers, analysts and managers. Let’s review each of the different roles and their responsibilities.
Data Engineers— Data Engineers are responsible for building and maintaining the technical infrastructure required in order do modeling, predictions, and analysis. The engineers create and maintain databases, machine learning pipelines, and production processes. Without having properly stored data, modeling processes, and the ability to serve predictions in production a Data Scientist is essentially useless.
Data Scientists — Once the initial groundwork has been laid, a Data Scientist then owns the modeling process. Generally, they take input parameters from product or other team leads in order to understand the model’s business objective. They then work to articulate requirements to the engineers and other stakeholders. Once these criteria have been defined, the process of building tests, models, and evaluating performance begins.
Data Analysts — As your team continues to grow and you are scaling up, your modeling Data Analysts become a very important part of the team. Having started my career in this position, I have a deep respect for the value a machine learning analyst can provide to a mature team. Analysts monitor processes, evaluate data quality, and monitor production model performance. These steps seem relatively routine but when you realize the fact that a model is never “complete” and will always require some oversight then appointing an analyst to manage the process makes sense. This allows your more senior assets to focus on innovation instead of maintenance.
Managers — As the data team and number of models grows, the need for a Data Science Manager appears. This person coordinates the quants, devs, and analysts as well as manages external demand of the data science team. The Data Science Manager essentially guides the process, allocates resources, and occasionally shields the team from ad hoc requests so they are able to achieve their primary objectives.
As you will hopefully find out for yourself, growing into a mature data science organization is an extremely fun and transformative experience. By breaking the team down into sections and working to fill those pieces as needed you will increase your odds of success in the long run.