(We suggest becoming familiar with the Data Supply Chain before reading this article.)
"Brands that declare responsibility for educating their users about data security and use have an opportunity to build trust and earn loyalty."
The likelihood that private data may become misused, hacked, or de-anonymized increases with each new advance in technology.
Gaining consent from users requires more than 'opt-in' or 'opt-out'—consent is only meaningful if users understand the impact of their choices.
Organizations have both a moral and increasingly legal need to minimize and mitigate harm to users who disclose personal data.
Inherent biases are introduced through algorithm selection, training data, and hypothesis testing, which can result in automated decision-making that is biased.
Analytics can uncover information previously unavailable: it’s already possible, in some cases, for governments to use big data analytics to discover crimes that otherwise would have remained secret. What should be done with that information? Is that an easier question to answer when the culprits are terrorists or sex offenders? What if the government in question is an oppressive regime and the crime breaks a law related to censorship? It is difficult to imagine the potential harm of unintended consequences in these areas, let alone take active steps to prepare for that harm, mitigate it, and recover from it.
One prudent approach to minimize the potential for harm is gaining informed consent from individuals disclosing data. With the increasing presence of (and reliance on) digital technologies, individuals must understand what they consent to by sharing data. Similarly, it’s important to help designers and developers minimize unintended harm from the use of that data. In the field of data science, practitioners must integrate ethical decision-making into most discussions about data to avoid unforeseen risks that arise from the complex movement of data through a wide diversity of systems. Ethical behavior in this context is about the treatment, protection, and transformation of data moving between systems (“data in motion”)—not just recorded, static data (“data at rest”). This publication explores how the concept of informed consent in a world of “data in motion” might be addressed to help apply the principle of doing no harm in the digital age.
To truly consider informed consent, it is important to understand the concepts of “data at rest” and “data in motion,” particularly in contemporary digital systems.
Traditionally, data gathered for electronic record-keeping was in the same paradigm as files in a filing cabinet. Data was recorded by a human at some point, filed away, and retrieved (and perhaps updated) as needed. Data that was no longer relevant would be discarded to make room for new data.
Early digital systems were similar: data was input by human beings, created by computer systems, or sensed within an environment, and then more or less filed away to be retrieved later, when needed.
Modern data management can be mapped to three key stages:
1. Disclosing / Sensing—humans or machines that gather and record data.
2. Manipulating / Processing—aggregation, transformation, and/ or analysis that turns data into useful information.
3. Consuming / Applying—a person or machine uses the information to derive insights that can then be used to effect change.
Historically, digital systems were not as interoperable and networked as they are now, so data could be thought of as being “at rest”—stored statically, just like the files in filing cabinets of the past. But today, some data is in near-constant motion. When we access social media sites, we’re not just pulling static data from some digital filing cabinet— we are accessing data that is in constant transformation. For example, algorithms shift which news stories are displayed to us based on an ever-evolving model around our tastes and the tastes of other users. An action taken in an app, like an online retailer, connected to a user’s social media account could change the content delivered to a user. Managing the complexity of consent and potential harms in this environment is much harder than protecting privacy in traditional market research and mass-purchase, broadcast advertising.
This data in motion is much harder to comprehend at scale. The chaotic effects of multiple, interoperable systems and their data playing off each other make it difficult for design and development stakeholders to see the big picture of how data might affect their users—much less communicating with those users about informed consent or doing no harm. Data in motion can be relatively straightforward in the context of the flow of interactions through each stage of disclosing, manipulating, and consuming data. However, although it can be tempting to think of data as a file moving from one filing cabinet to another, it is, in fact, something more dynamic, which is being manipulated in many different ways in many different locations, more or less simultaneously. It becomes even more ambiguous when a lack of interaction with a piece of data could still be used to draw conclusions about a user that they might otherwise keep private.
For example, ride-sharing apps need to collect location information about drivers and passengers to deliver the promised service. This makes sense in “the moment” of using the app. However, if the app’s consent agreement allows location data to be collected regardless of whether or not the driver or rider is actually using the app, a user may be passively providing their location information without being actively aware of that fact. In such cases, the application may be inferring things about that passenger’s interest in various goods or services based on the locations they travel to, even when they’re not using the app.
Given that location data may be moving through mapping APIs or used by the app provider in numerous ways, a user has little insight into the real-time use of their data and the different parties with whom that data may be shared. For users of the ride-sharing app, this may cause concern that their location data is being used to profile their time spent outside the app—information that could be significant if, for example, an algorithm determines that a driver who has installed the app is also driving with a competing ridesharing provider.¹ Without clear consent agreements, fear about where data moves and how it is used becomes can erode trust that their best interests are being served.
Data may be sourced from archives or other backups
Guideline: Ensure the context of original consent is known and respected; data security practices should be revisited regularly to minimize risk of accidental disclosure. Aggregating data from multiple sources often represents a new context for disclosure; have the responsible parties made a meaningful effort to renew informed consent agreements for this new context?
Data is collected in real-time from machine sensors, automated processes, or human input; while in motion, data may or may not be retained, reshaped, corrupted, disclosed, etc.
Guideline: Be respectful of data disclosers and the individuals behind the data. Protect the integrity and security of data throughout networks and supply chains. Only collect the minimum amount of data needed for a specific application. Avoid collecting personally identifiable information or any associated meta-data whenever possible. Maximize preservation of provenance (or lineage).
Data is stored locally without widespread distribution channels; all transformations happen locally.
Guideline: Set up a secure environment for handling static data to minimize the risk of security breaches; ensure data is not mistakenly shared with external networks. Data movement and transformation should be fully auditable.
Data is actively being moved or aggregated; data transformations use multiple datasets or API calls which might be from various parties; the public Internet may be used for data access or transformation.
Guideline: Ensure that data moving between networks and cloud service providers is encrypted; shared datasets should strive to minimize the amount of data transferred and anonymize as much as possible. Be sure to destroy any temporary databases that contain aggregated data. Are research outcomes consistent with the discloser’s original intentions?
Data analytics processes do not rely on live or real-time updates.
Guideline: Consider how comfortable data disclosers would be with how the derived insights are being applied. Gain consent, preferably informed consent, from data disclosers for application-specific uses of data.
Data insights could be context-aware, informed by sensors, or benefit from streamed data or API calls.
Guideline: The data at rest guidelines for data consumption are equally important here. In addition, adhere to any license agreements associated with the APIs being used. Encrypt data. Be conscious of the lack of control over streamed data once it is broadcast. Streaming data also has a unique range of potential harms—the ability to track individuals, deciphering network vulnerabilities, etc.
Creating interdependencies between multiple organizations’ datasets
Trading data among multiple organizations can make data more useful and information more insightful. As a discipline, interdependent functions require the ability to predict potential effects when data is used in new ways or new combinations, and requires recording data’s movement in a way that makes tracking provenance possible when failures occur. Just as diplomats must consider many complex and sometimes unpredictable risks and opportunities when engaging across borders, so too must leaders and developers.
Organizations must be willing to devote at least as much time to considering the effects of this data-sharing as they are willing to look at its monetization options. They must also find a common language with other organizations—and end-users— to determine what is an effective and acceptable use of data. Informed consent requires that data diplomats—be they business executives, application developers, or marketing teams, among many others—communicate proactively about the potential pitfalls of data exposure.²
Organizations that are effective at communicating their data-sharing efforts stand to win the trust of users. These users will be willing to take a (measured) risk in sharing their data with the promise of a return of more useful data and information, less expensive services, or other benefits.
Organizations must inform employees and partners of the goals for data use and the ethical concerns and frameworks within which those actions must fall to achieve informed consent on the part of data disclosers and maintain consent over time. Outreach of this type is part of larger “data fluency”—a shared understanding of how data is disclosed, manipulated, and processed—and it is needed throughout the organization.
At a minimum, a data fluency program should cover:
In the process of modeling potential uses for data, unspoken values will become apparent. If good feedback loops are in place, end-users will be able to signal to developers where their values diverge from those of the developers or where implementation does not bear out the intended result. Decision principles for data handling and management of consent need to be incorporated not just at the edges of an organization’s human decision-makers but at the edges of its computing infrastructure as well (as embodied in the algorithms used in analytics or machine-learning processes).
Decision principles, which can be defined as guidelines for effective improvisation, can be used to achieve this requirement. The concept is sourced originally from the military context of 'doctrine,' where commanders need to empower the soldiers on the front lines to have rules of engagement that not only specifically proscribe or restrict their actions but also give them sufficient information to make smart decisions in line with a larger strategy, even when not able to directly communicate with the command structure.²³
Wise leaders will attend to the management of consent and prevention of harm through good design, seeking informed consent from users, monitoring and managing consent over time, and creating harm mitigation strategies. And with Data Fluency programs in place in their organizations, embodied in clear decision principles, their teams and partners will be able to fulfill the promises companies have made to their users—both to avoid harm and to create new value.
MJ Petroni
Cyborg Anthropologist and CEO, Causeit, Inc.
Jessica Long
Cyborg Anthropologist, Causeit, Inc,
Steven C. Tiell
Senior Principal—Digital Ethics
Accenture Labs
Harrison Lynch
Accenture Labs
Additional Contribution from Scott L. David
University of Washington
Data Ethics Research Initiative
Launched by Accenture’s Technology Vision team, the Data Ethics Research Initiative brings together leading thinkers and researchers from Accenture Labs and over a dozen external organizations to explore the most pertinent issues of data ethics in the digital economy. The goal of this research initiative is to outline strategic guidelines and tactical actions businesses, government agencies, and NGOs can take to adopt ethical practices throughout their data supply chains.
This document makes descriptive reference to trademarks that may be owned by others.
The use of such trademarks herein is not an assertion of ownership of such trademarks by Accenture or Causeit, Inc. and is not intended to represent or imply the existence of an association between Accenture, Causeit, Inc. and/or the lawful owners of such trademarks.
© 2016 Accenture. All rights reserved. This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. Accenture is a trademark of Accenture.