Demystifying Data Science
Data Science: “Give me headlights and take the wheel”
What could data science do for your business? In the second of this 3-part series, data scientist Mayank Sharma outlines some real ways businesses are using algorithms and computational models to meet corporate objectives.
Prediction and automation
The strategic value of data science for a company depends on what kinds of uncertainties you’re facing. If you’re looking at aggregate financial information and the health of the company, then you don’t need extremely sophisticated algorithms just to figure out where you are. Where data science plays a role is in predicting where you will go next - to know more of the factors that contribute to the growth and risk in your business.
“Give me headlights” - making predictions from big data
Every business leader needs to understand the factors that affect the company’s health and risk levels, and how to actually move the right levers in terms of investments, actions, decisions that you make, so you move in the right direction. In the old world, as I explained in part 1, statistics was typically used for descriptive analytics - explain to me what I’m doing: Characterise the distribution of revenue, distribution of delays for certain activities happening, etc. Predictive analytics became much more popular in the last decade - where people said “give me headlights” - give me dashboards that tell me where my revenue will be next year.
Business leaders are starting to see where they’re headed. But can they change course?
The biggest laggard, the way I see it, is in closing the loop with prescriptive analytics - where your tool tells you what you need to do to get on the right course. And even now, when you look at very sophisticated predictive analytics or deep learning systems out there, two key attributes are often missing in the outputs of these systems: Explainability and interpretability.
A lot of people will downplay these terms, but as a person who is invested in prescriptive analytics, if I don’t give you both explainability and interpretability then I have come up short in making the expected impact for the business. If I can’t interpret for you the key features or model parameters that affect your business targets (interpretability), and if I can’t explain what actions cause what outcomes (explainability), then I can’t make effective recommendations on how to change your actions to achieve a different end.
At the frontier of Data Science
An industry that’s seen some success in both describing, predicting and prescribing actions are supply chain and logistics. Some companies - Amazon, for example - have been able to really sharpen their ability to take analytics all the way from learning what they’re doing, to predicting what might happen, to actually controlling their distribution network tightly: Controlling costs, controlling the ecosystem so that they can extract the maximum value out of it. But again, there’s a large number of logistics and supply chain companies out there that are not able to do the last part - figure out how to change things.
With new tools built on machine learning, some of this prescriptive ability is beginning to emerge in other industries as well. With the vyn platform, for example, users get ‘next step’ recommendations at key decision points in a workflow, and leading performance indicators help to prevent bad outcomes before they happen. Data science models have the power to drive smarter decisions because the algorithms can tell you what needs to be done and when.
Workflow automation - when to replace humans with algorithms
Automation is often a prerequisite to being able to change the way a business operates. Essentially, you can use your data science tools to identify and enable opportunities for automation. The objective could be to boost revenue, reduce cost, or both - depending on what stress you’re under and what type of business you operate.
But be prepared to be surprised when your analysis leads you to the answer that for certain workflows - often consisting of tasks requiring higher cognitive abilities - people are vital. And even if large efficiencies may seem achievable, let us make no mistake: there are serious sociological ramifications of automation, especially the wanton kind. Businesses will eventually have to ask the hard questions around “I can, but should I?”. That though is a complex topic, better discussed separately.
The challenge across the board is, what is the appropriate level of automation? If the operating environment or workloads change faster than we can reliably learn from the data, can we trust a machine to keep up? Or imagine you’re training an algorithm to classify outcomes as “good” or “bad”. You decide, based on your business needs, that you will only use the algorithm once it is correct 99% of the time. If your data is not good enough - if you lack contextual information or if it’s not timely enough - it may only be 80% accurate and that will not be good enough.
Three approaches to dealing with these kinds of accuracy issues are:
1. Order your problems by complexity: Some will be simpler and others very complex, and you hope the machine can process the simpler ones. You might get 99-100% accuracy on simple problems, but you still leave 5% of the problems that are really hard and you let real people work on those.
2. Improve the quality of your data: If your sources of data are incomplete, outdated, or faulty in some way, of course you will get a low accuracy rate. In these cases, you should review the reliability and timeliness of your data.
3. Add context or ‘dimensions’: Many businesses don’t know what data is relevant to their problem. Ask yourself, what other factors could possibly be influencing these outcomes? By adding more dimensions to your data, you permit better feature engineering, lifting the problem out of its nominal dimensions and into a higher dimensional space more suitable for exploration.
For some problems, algorithms may never get to 100% accuracy, for others it may happen in a year, two years, or five. What’s important is that the business has a clear idea of what level of accuracy they need.
Business leaders - all of us - are prone to generalisations, we’re prone to following our gut, and we think we know what factors affect an outcome. Data Science is really the open-minded, exploratory, scientific process of asking what are the key hypotheses, what data and algorithms will best answer your questions, and what is a principled way of making decisions under uncertainty.
This is part 2 of 3 in a blog series called “Demystifying Data Science”. Part 3, “Avoid the pitfalls of Data Science: 3 steps to success” will be published soon. Follow vyn on LinkedIn or Twitter to be the first to read it, or sign up to our newsletter.