Simply stated, data sensemaking is what we do to make sense of data. We do this in an attempt to understand the world, based on empirical evidence. Those who work to make sense of data and communicate their findings are data sensemakers. Data sensemaking, as a profession, is currently associated with several job titles, including data analyst, business intelligence professional, statistician, and data scientist. Helping people understand the world based on data is important work. Without understanding, we often make bad decisions. When done well, data sensemaking requires a broad and deep set of skills and a commitment to ethical conduct. When data sensemaking professionals fail to do their jobs well, whether through a lack of skills or other ethical misconduct, confusion and misinformation results, which encourages bad decisions—decisions that do harm. Making sense of data is not ethically or morally neutral; it can be done for good or ill. “I did what I was told” is not a valid excuse for unethical behavior.
In recent years, misuses of data have led to a great deal of discussion about ethics related to invasions of privacy and discriminatory uses of data. Most of these discussions focus on the creation and use of analytical algorithms. I’d like to extend the list of ethical considerations to address the full range of data sensemaking activities. The list of ethical practices that I’m proposing below is neither complete nor sufficiently organized nor fully described. I offer it only as an initial effort that we can discuss, expand, and clarify. Once we’ve done that, we can circle back and refine the work.
The ethical practices that can serve as a code of conduct for data sensemaking professionals are, in my opinion, built upon a single fundamental principle. It is the same principle that medical doctors swear as an oath before becoming licensed: Do no harm.
Here’s the list:
- You should work, not just to provide information, but to enable understanding that can be used in beneficial ways.
- You should develop the full range of skills that are needed to do the work of data sensemaking effectively. Training in a data analysis tool is not sufficient. This suggests the need for an agreed-upon set of skills for data sensemaking.
- You should understand the relevant domain. For instance, if you’re doing sales analysis, you should understand the sales process as well as the sales objectives of your organization. When you don’t understand the domain well enough, you must involve those who do.
- You should know your audience (i.e., your clients; those who are asking you to do the work)—their interests, beliefs, values, assumptions, biases, and objectives—in part to identify potentially unethical inclinations.
- You should understand the purpose for which your work will be used. In other words, you should ask “Why?”.
- You should strive to anticipate the ways in which your findings could be used for harm.
- When asked to do something harmful, you should say “No.” Furthermore, you should also discourage others from doing harm.
- When you discover harmful uses of data, you should challenge them, and if they persist, you should expose them to those who can potentially end them.
- You should primarily serve the needs of those who will be affected by your work, which is not necessarily those who have asked you to do the work.
- You should not examine data that you or your client have no right to examine. This includes data that is private, which you have not received explicit permission to examine. To do this, you must acquaint yourself with data privacy laws, but not limit yourself to concern only for data that has been legally deemed private if it seems reasonable that it should be considered private nonetheless.
- You should not do work that will result in the unfair and discriminatory treatment of particular groups of people based on race, ethnicity, gender, religion, age, etc.
- If you cannot enable the understanding that’s needed with the data that’s available, you should point this out, identify what’s needed, and do what you can to acquire it.
- If the quality of the data that’s available is insufficient for the data sensemaking task, you should point this out, describe what’s lacking, and insist that the data’s quality be improved to the level that’s required before proceeding.
- You should always examine data within context.
- You should always examine data from all potentially relevant perspectives.
- You should present your findings clearly.
- You should present your findings as comprehensively as necessary to enable the level of understanding that’s needed.
- You should present your findings truthfully.
- You should describe the uncertainty of your findings.
- You should report any limitations that might have had an effect on the validity of your findings.
- You should confirm that your audience understands your findings.
- You should solicit feedback during the data sensemaking process and invite others to critique your findings.
- You should document the steps that you took, including the statistics that you used, and maintain the data that you produced during the course of your work. This will make it possible for others to review your work and for you to reexamine your findings at a later date.
- When you’re asked to do work that doesn’t make sense or to do it in a way that doesn’t make sense (i.e., in ways that are ineffective), you should propose an alternative that does make sense and insist on it.
- When people telegraph what they expect you to find in the data, you should do your best to ignore those expectations or to subject them to scrutiny.
As data sensemakers, we stand at the gates of understanding. Ethically, it is our job to serve as gatekeepers. In many cases, we will be the only defense against harm.
I invite you to propose additions to this list and to discuss the merits of the practices that I’ve proposed. If you are part of an organization that employs other data sensemakers, I also invite you to discuss the ethical dimensions of your work with one another.