This site is in beta. Tell us what you think.
Chapter 5 | Data Supply Chain Guidebook

Analyze

During the analyze phase, data turns into meaningful information.

This is a crucial point in data's journey. When we talk about data, we may sometimes think of it as being the same throughout the supply chain, but most data is essentially useless until you apply the proper analysis to it. Think about the difference between the raw data generated by your banking transactions vs. fraud alerts that come from your bank analyzing transactions effectively.

In most situations, unanalyzed data is not of much use to humans. It may be hard to interpret, in machine-only form, or overwhelming in quantity. We'll look at key types of analysis here and discuss further applications of analyzed data in the next stage, Use.

Autonomous vehicle example

Comparing raw data to map and traffic data

In the autonomous vehicle example, a company's algorithms analyze raw data from all vehicles in an area and compare it to map and traffic data to see how their cars are faring in different parts of a city.

Common reasons to perform analysis of data:

Describe

Describe what happened in the past or what is happening in the present.

Example: determine last quarter's profitability or today's air quality level.

Predict

Predict trends or potential outcomes with some degree of certainty, based on past and present data.

Example: forecast next quarter's sales or tomorrow's weather.

Recommend

Recommend options to humans or machine systems based on a variety of inputs.

Example: suggesting movies or news stories to users based on what their peers have watched or read, or proposing changes to text based on the tone a reader might expect.

Decide

Decide the next action without ambiguity—and without additional human input.

Example: accept or reject a user's request for credit.

Diagnose

Diagnose the underlying cause(s) of a result.

Example: identify which part is broken in a car's motor, or find the source of new sales activity.

Discover

Discover new opportunities or combinations within a particular domain.

Example: find unidentified features in photographs, trending discussions on social media, or commonalities between successful investment strategies.

There are additional uses of data that don't require analysis, covered in the next stage, Use. If you don't plan to analyze data, it's still helpful to think about what analyses could be performed on it by other users in the future so that you consider potential ethical and business implications of your data gathering.

Discussion Prompt

What analysis will you do of the data?

Algorithms and Machines that Learn

We've all heard the term 'algorithm', but what does it mean?

An algorithm is a step-by-step method for solving a problem, expressed as a series of decisions—like a flow chart or decision tree. Some advanced algorithms can 'learn' and update their decision trees using models that adapt over time. This is usually referred to as machine learning.

To compare simple algorithms to machine learning, consider the relatively simple question “What do I do when I encounter a stop sign?” versus the complexity of the question “How long will it take to drive the kids to school based on current traffic?”

Another use for self-updating algorithms is image recognition because they ‘learn’ more as they are exposed to new data. This goes beyond the scope of this guidebook, but you can find out more in Introduction to Machines That Learn here.

Make sure that you ask questions about the models in your algorithms so that technical stakeholders can work with you to select the right strategy and tools for machine learning.

Discussion Prompt

Will you change data or add to it in any way?

Predictive Analytics

Predictive analytics uses data, statistical algorithms, and machine learning to calculate the likelihood of future outcomes. This is different from traditional analytics which are focused only on what happened in the past.

For example, a notification predicting a flight delay while traveling is due to a system comparing data about the current and past flights and referencing other algorithms processing weather data. Machines are analyzing and learning constantly to provide up-to-date estimates. In the past, such alerts might only have been based on human prediction, but now machines and humans work together to make quicker, more accurate predictions.

Natural Language Processing & Sentiment Analysis

'Natural language processing' is another form of analysis. It's a set of algorithms designed to make mathematical representations of verbal communication. Additional algorithms are used to analyze sentiment within that verbal communication. Siri, Google Assistant, and Alexa all rely heavily on natural language processing.

One example of natural language processing and sentiment analysis can be found in video conferencing software Uberconference, which provides transcripts and feedback to users about their calls—a great example of natural language processing and sentiment analysis.

Many conference call services now offer a feature where machines transcribe everything we say during a call and then highlight action items, questions, and key moments of discussion to make it easier to review meetings and convert ideas into action. This is a potentially huge boost to productivity and reflects important information we may even have missed.

You can think of natural language processing as applying to anywhere verbal communication (written or spoken) happens. You might analyze social media posts for word choices to see if people are frustrated with a company, or you might analyze news stories to spot any disturbing trends showing up about a company that you're invested in.

You can find out more about Natural Language Processing and Sentiment Analysis here

Read More

Ethics in the Analysis Phase

The analysis phase presents many ethical challenges—and opportunities.

Watch Out For Bias

Algorithms are easy to think of as math formulas. However, if they are mathematical abstractions of human values (like trust, for example), the developers creating them can unconsciously incorporate their own biases. Those biases can be hard to remove, especially once a complex set of systems is in place. Therefore, when developing predictive analytics or machine learning algorithms, make sure that you are managing bias in the process.

This bias issue happened when Goldman Sachs, which hosts the Apple Credit Card infrastructure, developed an algorithm that screened applicants for creditworthiness. In certain situations, their algorithm somehow came to the erroneous conclusion that women were less creditworthy than men. In one notable case, a couple who had completely joint finances received very different offers. The male member of the couple received a credit limit 20 times higher than that of his wife, despite her superior credit score. It was a scandal, and with the interdependence of the two firms involved, it was harder to identify which was responsible.

When the algorithm is complex and not visible to others, it’s difficult to do the forensic analysis to determine the source(s) of the unintended outcome—was it the accuracy of the data fed into those algorithms? Or is the algorithm itself wrong?

With feedback loops between the data and ‘self-teaching’ algorithms, the cause of such unintended outcomes can be a mystery for users and a black eye for major brands.

In conjunction with leading researchers, including Accenture, Causeit wrote an extensive guide to these issues covered in Data Ethics and Data Politics.

Create Feedback Loops

To avoid unintended consequences from data use, integrate active feedback loops from customers.

You might see this in the form of a recommendation from Amazon coupled with the question, “did we recommend the right product?” Facebook might ask you if the face it identified in the photo you uploaded is accurate.

A crucial part of raising data fluency is finding ways to help humans steer the direction of machine algorithms.

Analyze Data "At the Edge"

Sometimes devices analyze and obtain insights for users without ever sending raw data to central systems. If we can analyze at exactly the same point that data enters the supply chain, we may never need risk accessing personally-identifying information at all. For example, Apple prefers an on-device approach, doing much of their processing (facial identification, voice recognition, etc.) securely on users’ phones, rather than passing raw data (like face biometrics or voice recordings) to the cloud as often as Amazon and Google do.

Discussion Prompt

Will you change or add to the data in any way after analysis?

Exercise

Asking Data Questions

Think of an example of big data or little data related to your work. Now, imagine what kind of information or insights you could get by analyzing that data.

Example: If I analyzed a collection of data about my typing accuracy and speed cataloged by time of day, I could look for patterns in how many errors I make in the morning versus the afternoon.

Recap

  • Analysis of data often falls into the categories of describing, diagnosing, recommending, deciding, predicting and/or discovering
  • Algorithms are the collections of steps computers do to analyze data
  • Predictive Analytics is the use of algorithms to predict the future based on the past
  • Machine learning is the more practical implementation of what is often called artificial intelligence
  • Natural language processing & sentiment analysis is the use of machines to understand human language and meaning
  • Create feedback loops whenever using algorithms that affect human beings to make sure the output of those algorithms is useful and safe in the real world
  • Watch out for bias in algorithms—they're often made by humans, after all

Discussion Prompt

  • What analysis will you do of the data?
  • Will you change data or add to it in any way?