How to guide your team and improve data-driven business cases
by Martin Schmidt and Marcel Hebing
An ever increasing number of use cases for data science are evolving in most companies from nearly every sector. From small businesses to big industries, the number of data scientists is continuously increasing and with them, the size and complexity of data science teams gets bigger. At the same time, it is reported that only a few (22%) data science projects show high revenue and big data projects fail in large numbers (60 to 85%) [Atwal 2020]. This leads to the question: how to deal with the complexity of managing data science teams?
In this article we want to:
- define what data science management is,
- make a suggestion for the promotion of a data science culture,
- describe the tasks of a data science manager,
- outline the need for data science managers in companies,
- and give a glance over how to find the right person for the job.
What is Data Science Management?
Data scientists are well trained information scientists, statisticians, natural scientists, social scientists, or mathematicians. Some even studied data science as a stand-alone bachelor’s or master’s course. They solve problems, challenge well-trodden paths and count the countables. They give insights into complex processes, analyze huge datasets and face problems not tackled before. They help save time, automate processes and build the future in many ways. But on some occasions, they tend to love solving problems so much that they lose focus. That’s when the data science manager comes into play.
Data science management is a subdivision of management, not data science. Data science managers are appointed to represent and live the vision of the company and to achieve its goals. To do so, managers need to empower people, encourage teams, steer and inspire those people. They have resources to help them on their mission as described above (compare Servant Leadership [Eva et al. 2019] and [Chang 2019] for more on that topic). They work best if they can avoid micromanaging their teams, keep focussing on the bigger picture and translate both the real-world application of a project to the data scientists, as well as the results to everyone else. Moreover, they need to have at least a basic understanding of the fundamentals of data science, the nature of iterative projects and of the academic background of most data scientists. First and foremost, data science managers need to be good communicators in every direction.
To conclude, we understand data science management as follows:
– Data science management means organizing an iterative process of extracting insights from data with scientific methods and highly automated software infrastructures pursuing business goals.
Why should companies care?
Digitalization and digital transformation have impacted many business models; and disruptive technologies have rendered formerly robust business cases obsolete. That’s why many businesses have to focus on cost efficiency, which calls for further process optimization. As many industries are already highly efficient, only automatization, optimization and dormant potentials can save money. With automatization comes a lot of data, which holds also true for modern digital machinery. Moreover, many other data sources from different contexts like social media, weather forecasts or even data generated by competitors are accessible.
To achieve a competitive advantage, companies need to unlock the potential of various data sources and blend them to gain more and better knowledge. In many cases, people think that more data is better (“big data”), which is not entirely accurate. There needs to be the right data with the right relationship to other datasets. This is often called “smart data”. Smart data focuses on the way that data (including big data) are integrated and prepared to provide real business value [Iafrate 2015].
Many companies nowadays are hiring data scientists. The employer-employee relationship is at risk to be a disillusionment for both: the companies are often not prepared, in terms of data infrastructure and facilities, for highly skilled people. They are overchallegend and do not fit data scientists into the correct setting so that they are able to fulfill their potential as quite an expensive workforce. On the other hand, data scientists are sometimes unchallenged in such environments and face boredom. As they do jobs that are under their skill level, they get frustrated. As a consequence, data scientists are not as creative and don’t unveil the potential for the company. These conditions have a twofold effect: the companies have expensive data scientists who are unable to realise the potential of data science for the business.
Promotion of a Data Science Culture
Ideally, a data science manager is aware of all these things beforehand and is able to both prepare the company before hiring data scientists and set the data scientists into the best environment. In Figure 1 are some suggestions of what can be done on an everyday basis to achieve an open minded data science culture:
Figure 1: Towards a fruitful culture for data science: If the above-mentioned things (and more) are in the DNA of a company, it is likely that there is the right setting for successful data science projects. Image by authors (CC BY 4.0).
A data science manager should be ahead of things: he or she must be able to scope upcoming topics in the company, so that data scientists require a different and deeper focus on things. They remove barriers (especially in terms of data accessibility) and facilitate communication between subject areas [Bhatti 2017]. As data science is a team sport, data science managers should be encouraging people in order to strengthen the sense of togetherness.
Finally, a data science manager should have some degree of freedom. Since they work in an ever-evolving and dynamic field, they should have the appropriate resources needed to attend conferences and working group meetings. Those things should be seen as advanced training. Companies should recognise managers as a kind of task force and avoid micromanagement. Data science managers are not the overhead of data science. Data science managers are the ambassadors and enablers of proper data science.
Tasks for Data Science Managers
When it comes to the daily business of a data science manager, some tasks and duties (Fig. 2) are recurring and need to be kept track of. The procedures are not that different as those in a typical software engineering project; however, some aspects stand out.
Figure 2: The triangle of information flow for data science management. The blue arrows indicate a relationship towards the person who pulls information. The grey arrows indicate a push of information towards the receiver. While the data science manager is in close contact with both his/her team and the stakeholders, the push relationship between the data science team and the stakeholders is interrupted as the data science team should stay in the Team Bubble. Besides managing relationships, the data science manager has the essential project management duty of ensuring the overall success of a project. Image by authors (CC BY 4.0).
Task 1) Requirements Management
In most data science projects the first step is to talk to stakeholders and find out what they need. This is mostly about extracting information and understanding the real-world business problems [see also “Business Understanding” in Shearer 2000]. It is important to talk about expectations and should finally lead to an answer to the question: What will be different to the stakeholders once the data science project is successfully finished?
The recorded requirements then need to be translated into analytical tasks for the data scientists [Provost and Fawcett 2013]. These tasks have to be cut into digestible pieces. The technical or scientific depth can be discussed with the data scientists. This could be done by putting all the items in a backlog and by writing user stories, as is common in software development [Scrum.org].
Task 2) Time & Resources
Dealing with complex problems often means dealing with uncertainty. At the same time, complexity needs to be reduced to make estimations for the project budget and therefore the money that is available to be spent. For the stakeholders it might be helpful to pin a price tag on the user stories, the bits of requirements or project phase by estimating the time and effort. However, dealing with complexity means dealing with various kinds of unknowns. It is good practice to put a time buffer in estimations according to the level of uncertainties. An useful approach is to sort user stories on the canvas of uncertainty (see Fig. 3 Rumsfeld’s Matrix) to get a first impression.
Figure 3: This Rumsfeld’s Matrix can be used as an uncertainty canvas [Fournet 2019]. Which factors are already known (and can be calculated), where are missing information, which someone else has (go and ask), which information are known to stay unknown (e.g., stock prices, the weather) and how many completely unexpected things usually happen — some software managers even multiply their estimates by pi (~3.14159), to calculate for the unexpected [Strom 2008]. Image by authors (CC BY 4.0).
It is essential that those people who bring the right skills to a project are on hand. And furthermore, they also must have the time to run it. In this matter it is important to not overload people with various projects. The more projects they are working on, the more time is lost due to transaction costs. Approximately 20% of the working time ist lost per additional project due to context switching [Weinberg 1975]. Having only a single person with a particular skill might be a risk to the whole project, if, for example, this person is unable to work or leaves the company. In addition, data accessibility needs to be clarified beforehand. There is nothing more inefficient than data scientists and any other team members to sit idle and wait for the essential data they need.
To get an overview of time and resources, the following questions are a good starting point [according to Seiter 2019]:
- Team: Which expertise is required? Are there the right people on board and do they have sufficient time to realize the project?
- Data: What data is available (in-house, open data, or for purchase) and can it be used to solve the analytical problems based on these data? Is it necessary to conduct data collections (e.g., a survey)?
- Infrastructure: Is everyone appropriately equipped with software, hardware, and cloud resources?
The above-mentioned questions help to find additional stakeholders that haven’t been thought of from the beginning. Moreover, it might clarify who else needs to be on board (e.g., the IT or legal department).
Task 3) Promotion
The project’s progress and results need to be presented to the stakeholders in a way that allows everyone to be on the same page. The data science manager needs to be prepared for various kinds of questions that are no longer one-dimensional: as the product or solution evolves, people tend to be more censorious as they can use more senses than their pure imaginations as in the requirements assemblage.
Therefore it is sometimes helpful to have reviews of the preliminary results like in a sprint modus within Scrum [see Scrum.org]. In many cases, the right involvement of stakeholders is important, as there will be future projects. If stakeholders are fine and satisfied with the results and participation, they are more likely to be generous in future.
Task 4) Frame & Context
Let’s get back to the team. Everybody needs to understand the road map of the project, the vision and be aware of the time frame. This includes a deep understanding of what’s going on and the obligation to speak up if something is off track. Sometimes it is hard to be the killjoy — especially with data scientists who love to solve problems and lose track as they dig deeper. But some problems are not always those within the scope of the business. Here it is important to show empathy and explain why there is another focus.
Task 5) Communication Facilitation
As far as it is beneficial for the project, facilitation of communication between data scientists, stakeholders and other potentially involved people is a fundamental task for every data science manager. Most importantly, a common understanding of, for example, processes, methods and goals between all parties involved needs to be supported.
In Figure 2, the communication facilitation task is modeled as a pull relation, whereby the team gets the information it needs from stakeholders. This is a crucial aspect of the model: the team gets the information it needs and, at the same time, it is protected from unchecked stakeholder requests (see next point).
Task 6) Team Bubble
Problems need to be kept away from data scientists as best as possible, in order to make sure they have the best environment to do their jobs. It might be helpful to have Coding Days or reserve the morning hours for concentrated work.
However, the team bubble might be far from realistic as tasks and requirements need to be refined with the stakeholders. A data science manager often does not have all the information to hand which is why data scientists need to be involved in meetings.
[Graham 2009] distinguishes makers (team) and managers (data science managers and most of the stakeholders) in terms of the schedules they are most productive on. While it might be absolutely normal to data science managers and stakeholders to plan days in one-hour-intervals, rushing from one meeting to the next, this is the worst thing to be expected from the makers in teams. They usually need long (4+ hours) intervals of long uninterrupted working time. And uninterrupted means uninterrupted — even a benign-seeming 5 minute talk can be devastating, taking 20–30 minutes to get back on track for a maker.
Task 7) Project Management Essentials
In most data science projects management essentials have to be considered as well. Once time and resources are figured out (see above) a timeframe can be crafted. This might be more or less difficult, depending on the preferred project management methodology. Some things are essential, like:
- keeping track of the goals, monitoring the overall progress and controlling the financial resources;
- assuring quality of both the process and the results;
- managing documentation, todo lists, boards and meetings;
- keeping in mind vacations, sick leave and advanced training times;
- and keeping the team happy!
How to find a Data Science Manager
“Managers coming from software engineering or business side functions (finance, accounting, etc.) can have trouble understanding data scientists and defining their project work” [see Jee 2019].
Managing data science is not the same as managing assets or resources. Besides hardware and software, the most important thing is people. In data science, most people have an academic background and frequently have a PhD from a certain field of research. That’s different from conventional software engineering or most other business units [see Jee 2019]. This high level of education often comes with equally high demands of the conditions of employment, in terms of meaning, responsibility, participation and of course, salary. The same holds true for a data science manager.
Even though some universities are beginning to train data science managers, it will probably take a few years until the new job profile is broadly established. Meanwhile, hiring a data science manager often means hiring a person who has highly specialised during his or her former career, wanting to broaden his or her focus in the present. Actually, it means hiring a nerd with a special interest in communication and management skills — admittedly, a rare set of skills.
A data science manager should understand analytical thinking, scientific methodology, deductive reasoning and dealing with complexity. Moreover, he or she should be a bright mind and curious person. A data science manager is someone who is able to give a proper presentation in front of customers or the management board while also being held in great esteem within his or her team for being able to sum up every calculation to 42 (if you are not familiar with references like this, you should really hire a data science manager as fast as possible).
Data science management is important for companies that want to improve their business with data-driven decisions. It means organizing an iterative process of extracting insights from data with scientific methods and highly automated software infrastructures in order to pursue business goals. The best practice approach for data science projects comprises requirements management, consideration of time and resources, the promotion of projects towards stakeholders and guidance of and from the team. Moreover, data science management is communication facilitation, building a ‘bubble’ for focussed work and project management essentials.
As of today, analytical thinking, programming skills and data science understanding are already an important part of business development. For many companies it is inevitable that they will develop digital business models that are based on data mining and data analytics. As the evolution of data-driven business models is a complex process with many hurdles, it would be wise to train or hire a data science manager. Data science management will become a stand-alone job profile in near future.
 H. Atwal, Practical DataOps (2020), Apress, DOI: 10.1007/978–1–4842–5104–1
 N. Eva, M. Robin, S Sendjaya, D. van Dierendonck and Robert C.Liden, Servant Leadership: A systematic review and call for future research (2019), The Leadership Quarterly, DOI: 10.1016/j.leaqua.2018.07.004
 R. Chang, So You Want to Become a Data Science Manager? (2019), Deliberate Data Science
 F. Iafrate, From Big Data to Smart Data (2015), John Wiley & Sons, Inc., DOI:10.1002/9781119116189
 B. Bhatti, What Are the Qualities of a Great Data Science Manager? (2017), Towards Data Science
 C. Shearer, The CRISP-DM Model: The New Blueprint for Data Mining (2000), Journal of Data Warehousing
 F. Provost and T. Fawcett, Data Science for Business (2013), O’Reilly Media, Inc., ISBN: 9781449361327
 B. Fournet, How to use the “Knowns” and “Unknows” Technique to Manage Assumptions (2019), The Persimmon Group
 D. Strom, For Technology Projects, Multiply by Pi (2008), Baseline
 G. M. Weinberg, An introduction to general systems thinking (1975), New York: Wiley
 M. Seiter, Business Analytics (2019), Vahlen, ISBN 978–3–8006–5871–8
 K. Jee, Why is it so Hard to Find Great Data Science Managers? (2019), Towards Data Science
Martin Schmidt works as a Data Science Manager for Deutsche Bahn AG. He is a lecturer at Digital Business University of Applied Science and Research Associate at Humboldt Institute for Internet and Society. By training, Martin is an agricultural scientist and holds a PhD in environmental modeling.
Marcel Hebing is professor of Data Science at Digital Business University of Applied Science. He is founder and Data Scientist at Impact Distillery and Research Associate at Humboldt Institute for Internet and Society. Marcel holds a PhD in Statistics, Computer Science and Social Science.