This is a version of one of my talks, ‘The Data Difference,’ that I gave for Black Swan, which discusses how the successful use of data depends on human intelligence even while mitigating its flaws. Data-driven algorithms are unaffected by common biases in human reasoning and are far better than people at many tasks. However, enthusiasm for ‘Big Data’ can obscure the fact that the success of data-driven decision-making still relies upon the human intelligence to identify the right data for the right problem at the right time.
Table of contents
How I became interested in data and decision-making
I’ll start off with some slides, to give you some context as to the purpose of this talk.
I met Steve a year and a bit ago and did some work on Black Swan strategy. When I finished that we were discussing what to do next. One of the challenges I’d seen looking at the strategy and doing that work is that it’s often harder than it should be to articulate to potential clients — or people generally — what the value of this work really is.
I don’t actually like the term ‘big data’ very much. You can read a blog post1 by me many years ago in The Guardian saying why I don’t like the term. People have heard of big data in companies because it’s seen as this magic thing. But often, I think sometimes even now, it’s not clear to people how it turns into tangible business value. Big data doesn’t necessarily address the problem they actually have. One of the things that is really special about Black Swan, in my experience — and I’ve done stuff across this sector for 15 years — is that there’s a really good connection between knowing the tech stuff — knowing the data — and really applying it in ways that are actually going to make a difference to somebody. Black Swan actually understands business problems, rather than just coming in with incredibly fancy technology that’s supposedly going to save the world and costs a lot of money. Part of what’s going on at the moment is that companies are challenging that and saying, ‘How will data actually make a difference?’
So part of the purpose of these talks is to introduce some companies to how data actually can make a difference.
I’ll tell you a tiny bit more about me before I begin. I grew up on a farm in Northamptonshire near Peterborough. I had seven dogs and three cats, when I was a kid, and about 150 cattle and 700 sheep. You work every day of the year, that’s one of the things I learned about farming. Including Christmas — especially Christmas — because no-one else works on Christmas.
But I ended up in technology, with the result that my mother, who’s the farmer, still doesn’t understand what I do. I particularly worked in digital policy and data. I help build, for example, a lot of open data portals around the world. Some of you may have scraped data from open data portals like data.gov.uk or data.gov and the National Statistics Office. I help build those sites and work in open data policy and various other areas within open data.
So I’ve ended up fascinated and passionate about how we can use digital technology to make a better world and in particular how we can use it to answer questions. I was incredibly curious as a kid and still am today.
When I was 17 years old, I applied to university. Among other places, I applied to Cambridge. I really wanted to go to Cambridge. I think it was about the 3rd of January and I remember coming back from a trip I’d gone away on with my girlfriend. I came back and met my father, who had this really grim expression on his face — although I didn’t know what that meant if anything, as my father generally had quite a grim expression on his face.
He said, ‘I’ve got this envelope for you.’ And as usual, for my father, he’d opened the envelope even though it was addressed to me. It was the letter from Cambridge and it began with those words you generally don’t want to read which say, ‘Thank you so much for applying, but we’re sorry to inform you that we’re not going to offer you a place to read mathematics.’ I had really thought I would go — I’d been told by my teachers I was good enough — and it was this crushing moment of realising that I wasn’t going to get in.
Then about a week later a ray of hope arrived; they sent me a second letter saying they weren’t accepting me, but they would put me in something called the pool. For those of you who don’t know, Cambridge has a centralised admissions system, so people are usually accepted by individual colleges, but some people who aren’t accepted by an individual college are put into a central pool, and some colleges that don’t have enough people might take them. And I was lucky enough to get an offer from another college who said that if I got especially high grades I would be allowed in. I got those grades and I went to Cambridge and I read mathematics.
The college I had applied to was called Trinity — the one in the film Chariots of Fire — which is the best college for mathematics. I graduated after three years, and they actually rank you in order when you do mathematics. It’s supposed to be secret, but the list gets shared somehow. So I knew that I came fifth in the Tripos. I came number five out of 235 people. That meant I had beaten all but two of the 35 people who had gone to Trinity in the end, which I was very satisfied about; I felt they had made this big mistake during the admissions process and it still rankled me.
The thing was that 10 years later I was a fellow at Cambridge in Economics. So I’d become an academic and I was suddenly on the other side of the table: I had 17-year-olds come in for 30 minutes with me, I’d read their transcript and I’d read their personal statement. If any of you here applied to university in the UK, you may remember you had to write a personal statement.
I would interview applicants for about 20 to 30 minutes and have to make a decision on their future. About two-thirds of them I would have to reject. We accepted one applicant in every three at the college I was at and generally that was the acceptance ratio. I remembered how wrong I thought it had gone when I applied and I remembered very clearly that it had gone wrong in the interview; I’d gone into the interview room, they had asked me to factor some factorial and I’d screwed it up. I remember when I left the interview room and I was thinking, ‘Oh, my God, I know how to do that. How could I have done that wrong?’ Has anyone ever had a job interview like that? I could see my chances slipping away at that moment. I’ll never know what meant I didn’t initially get an offer but it seemed like that was the reason. Plenty of people freeze up in interviews. How do you know in that 30 minutes what someone will be like for three years?
So I got really curious and asked around in Cambridge to see if we had any statistics. Did we have any data on the people we accepted? Or the people we pooled? Because that would be a nice sample — the people who were almost rejected, but who made it. Or even data on how people were rated? Because students were rated when they were accepted. How did that compare to how they did in their degree three years later? Did we have anything that would tell us how reliable our system was? The answer was no. There was no data at all. So I got interested in this. That interview had been close to having a very major effect on my life, and whether or not other people got into Cambridge had a major effect on their lives. So I’m fascinated by how we make these kinds of decisions.
Data-driven algorithms compensate for many of the biases in human decision-making, but we’re often reluctant to accept this
I’m not alone in questioning the reliability of these kinds of decisions; people have thought about this for a while. There’s something called the Oregon Research Institute where Lewis Goldberg has asked these questions about a whole range of areas. He asked: what about admissions to a psychiatric hospital? That seems like a bigger deal than getting accepted to university! Do you get put in a mental institution, or not? Do you get let out of a mental institution or not? How good are psychologists at making this diagnosis? Or, for example, when they let you out, how good are they at predicting whether you’ll come back, or that you’ll have another psychotic episode?
What about health? What about doctors? When a doctor looks at your cancer scan, how reliable is the doctor in saying, ‘yes, it’s cancer,’ or ‘no, you’re fine’? And Lewis Goldberg didn’t really want to evaluate how good the process was. He just wanted to see how you could model it — he was psychologically interested. Of course, it’s a very complicated process. Each decision involves taking loads of factors into account. Just think of these students I interviewed; they have their transcript, they have the books they’ve read, they tell me stuff at interviews, they went to different schools etc.
Goldberg looked at various areas. One of the areas he looked at was cancer. He and his colleagues at Oregon went down to the major Oregon teaching hospital and said, ‘Okay, we’d like your best oncologists to explain how they diagnose a cancer scan. If you see an X-ray of someone’s stomach and there’s an ulcer, how do you go about diagnosing whether that ulcer is a benign tumour or a cancerous tumour? And they said, ‘We can absolutely explain how we do it. There are seven factors we look at, including how big the cancer is, whether it’s been growing and whether the edges are rough or smooth. But it’s not simple; if the cancer is big and it’s got rough edges, that means something different from if it’s small and it’s got rough edges. There are complex interactions between different variables.’ So Goldberg thinks, ‘Well, God, it’s going to take years of research work to understand this process,’ which was great — that’s what he wanted. But to start with he wanted to keep it simple. Has anyone heard that phrase, ‘Keep it simple, stupid?’ The great piece of advice for all software engineering in the world. So Goldberg asked, ‘What is the simplest model I could make with seven factors?’
The simplest model would be to give each of the factors an equal weighting and sum them. A simple linear model. Goldberg got 96 X-rays and made a copy of them all; he doubled them and mixed them up, and then showed them to all the oncologists and got them to put the X-rays on a scale from totally benign to definitely cancerous.
Then he got his results. And his findings were surprising. First of all, every oncologist had seen the 96 X-rays at least twice, but in a random order, and every oncologist diagnosed at least one pair of X-rays significantly differently. Even more significantly, there was very little correlation or agreement among the oncologists. So one might say, ‘You’ve definitely got cancer,’ and another would say, ‘No, it’s fine,’ based on the same X-rays. That in itself might give you pause when you next go to the doctor. Secondly, the simple algorithm — this simple seven factor equally weighted algorithm — did better than the best doctor.
That study was published in 1968 as the result of about eight years of work. There was also a famous book in 1954 about clinical psychologists, in which a psychiatrist basically said that simple algorithms consistently do better at predicting various things, for example whether someone will have a psychotic episode or whether they will come back into treatment. Algorithms just perform better at diagnosis than almost every human expert. There are other, even better findings — or worse findings if you’re a psychiatrist. For example, first year psychiatry students do as well as psychiatrists with 30 years of experience at doing these diagnoses.
So there are hundreds of studies like this, on man vs simple models of man, showing that simple data driven models do better than experts. There are hundreds of these studies and they almost all consistently show, at least in the areas they studied, that algorithms do better; at predicting the outcomes of elections, predicting the outcomes of football games, etc.
Why is that? I’m going to offer three reasons to you. First of all, I told you I grew up on a farm. There is a super secret fact that only I may know, which is that sheep can count to three. They can count one, two and many and that’s kind of it. The reason I know this is that as a kid my mother explained to me that most sheep only have two lambs, but sometimes they have four. She said when they have four, you’ve really got to watch out, because they won’t know when they’ve lost a lamb. When they’ve got two lambs, they know when one of them goes missing, but when they have four, they’re like, ‘Well, I just had many — I don’t know whether it was three or four,’ and they lose them. Now, before you think, ‘My god, sheep are so dumb,’ what do you think humans can count to without a numerical counting system where you count off against something in your head? It’s only about six or seven. If you didn’t have language, didn’t have a counting system, and I threw, for example, pebbles on a table, you would be able to count about six just kind of perceptually. Above that, you have to start counting off.
Animals in general are phenomenal, and humans are really powerful, incredible beings. We have vision, I can talk to you and stand up at the same time — that’s a big deal, that’s not a trivial thing to do. But we weren’t really designed to do arithmetic. It’s not a hugely valuable thing, evolutionary, to be able to count to 1000. It sometimes happens — the protagonist in Rain Man can count that there’s 97 matchsticks on the ground. Certain people have that ability, but it’s not a very useful ability generally. In this world, where we have a lot more data and information, it’s becoming more useful, but the point is that machines are very good at counting compared to humans.
I can illustrate the second reason with another true story. Jennifer is a real woman who had a car accident just outside of Toronto. After the crash, she’s taken to the emergency room in the hospital and X-rayed. She has a lot of broken bones — she’s had a frontal collision — but one of the things that worries the doctors is that she has an irregular heartbeat.
This is not good news, and they’re thinking ‘What could this be?’ But they haven’t spotted anything, so they leaf through her notes very quickly and they say, ‘Ah, she’s got a thyroid problem. She’s got an overactive thyroid.’ One aspect of hypothyroidism is irregular heartbeat. So the doctors go ‘Great, okay, we don’t need to worry about that because it’s not life threatening, we can treat her later with some thyroid drugs. Let’s get to work.’ And then one doctor called Don Redelmeier, who’s been trained in decision-making at the hospital and who has now published a lot of papers on medical decision-making, says, ‘Wait. We have a checklist process here. And we also need to stop when we’re making decisions like this and think, because humans build stories, right? Humans have what is called this sense of representativeness.’
For example, is it more likely that a family had three boys and then three girls, or boy, girl, boy, girl, boy, girl?
The correct answer is that they’re both equally likely. But many people think that boy, boy, boy and then girl, girl, girl is less likely than the scenario where the boys and girls are alternating. People have a sense about what is normal and representative. At this moment in the story, the doctors have a story about Jennifer, which is that she’s got a thyroid problem, which is associated with irregular heartbeat. By default that would be a good diagnosis. But right now she’s just been in a frontal car accident. There could be a lot of other things. And so the doctors pause and they look harder and find Jennifer has a fractured rib that they hadn’t spotted on the X-ray, that has punctured her lung. Lung punctures cause an irregular heartbeat. And they give her treatment, without which she died then and there.
So the thing about humans is we get into stories about things. Once we have a story, we look for evidence that fits the story we have. We have all kinds of cognitive biases. There is this very, very famous paper by Tversky and Kahneman, published in 1974, which is really worth reading and very readable.2 They talk about these three heuristics that lead to cognitive biases: representativeness, availability and anchoring.
I can give you an example of anchoring you might love. Imagine you all came in here for this talk and I gave each of you a piece of paper with a random number between zero and 100. So some of you have got five or eight, and some of you got 94, or 90, or 83. And then I asked you to write down how many African countries there are in the United Nations. There would be a significant difference between your estimates. The people who had a low number would estimate around 25 on average and people who had a high number would estimate 45. I simply gave you a random number, but as a result of that your prediction about something completely unrelated will be significantly anchored. It’s why people in advertising and sales want to give people a number; if you’re going into a business negotiation you want to give a really high number and then negotiate down because it anchors people.
Then there’s availability bias. If I asked you to think of words that begin with ‘k’ and words that have ‘k’ in the middle, it would be easier for you to think of words beginning with ‘k’. There are many more words in English that have ‘k’ in the middle of them, but you can’t recall them. The human mind is not good about searching for words with ‘k’ in the middle, but we’re good about thinking of words that begin with ‘k’; key for example. So this is a cognitive bias to be aware of; differences in how readily we recall facts and examples can distort our perception of what is actually the case about the world.
Another story, which is described in their paper, is from the Arab-Israeli war in 1973, when lots of Israeli pilots were getting shot down in one particular group. The person in charge of the group was saying, ‘Should we stop them flying? What’s going wrong?’ And the response was, ‘Listen, this is just a tiny sample. These pilots have flown three missions, and you’ve got a group of about 15 people, you can’t statistically conclude anything.’
The best one of these stories is about reversion to the mean. Let’s say you have a kid, and your kid does really, really, really well on a school test, way better than normal. So you say, ‘Oh, well done, Jason, you did so well. That’s great.’ Then next time, he’s back to average. Maybe you think, ‘What that teaches me is that I shouldn’t reward him.’ So next time he doesn’t do so well, you say ‘Jason, that’s it. No TV after supper this week until you get better at your math.’ And guess what? The next time he sits a test, he does way better. You think, ‘Yes. Punishment works, rewards don’t.’
Now, why is that an example of fundamentally flawed human decision-making? Well, there’s a grade which is how well Jason is going to perform on average. When he does better, it’s just likely that next time he’s going to do worse because he’s going to go back to his average performance. But you’re going to reward him and then think, ‘God, I should never reward him because he’s got worse.’ And conversely, every time he does badly, it’s the case that on average next time he’ll do better. But you might conclude, ‘Punishment really works.’
There’s an idea that Israeli jews are somewhat pessimistic about the world, and there’s a great phrase in this paper where Tversky and Kahneman conclude that what this story teaches us is that we will be consistently punished for rewarding people, and rewarded for punishing them, even though, right at the psychological level, it isn’t having the effect it appears to be having.
So there are these fundamental biases. I think I’ve left the best one until last. This one is a big deal. So you think Cambridge entrance exams matter? What about getting out of jail? Decisions about whether you get parole? Coincidentally, this study was also done in Israel. Judges sit there and they have to decide whether someone’s going to get parole — whether they’ll be let out of jail to go back to their family and into the community. This study was carried out over eight months with about six different judges. So it’s over a long period. This graph shows the proportion of favourable decisions. The y-axis shows the likelihood of parole, where 1 means parole will definitely be granted and zero means parole definitely won’t be granted. The x-axis shows time during the day. So at the beginning of the day, if you’re asking for parole, there’s a 65% chance of it being granted. The morning goes on. By the time you hit lunchtime, there’s almost a zero percent chance of getting parole. Then the judges go off for their meal break. They come back from their break and it’s back to about 65% percent.
Then it falls again till the afternoon break when it’s gone back to about 10%. And then it goes up again after afternoon tea. Right? And then it falls to close to zero at the end of the day. This has been corrected for all the other variables you could think of. This is just caused by time of day. The basic logic of this, psychologically, is that granting parole is harder than not granting it. By default, if you don’t grant parole, they stay in jail, but there isn’t the possibility they’re going to go out and rape someone or the possibility of the judge thinking, ‘Oh, my God, I let them out and something terrible happened and it’s my fault.’ So when it becomes stressful, and when the judge is cognitively overloaded, the chance of granting parole decreases.
For a similar reason, you don’t want to go to a business meeting where a client has to sign off a lot of money just before they’re about to eat — it’s just a bad idea. When people have to make big decisions, particularly ones that may be potentially cognitively stressful — not psychologically but cognitively stressful — then you want to do it when people are well rested.
But the thing I want to get across is that of course machines don’t get tired in the same way. They can wear out but they don’t, over the morning, decide ‘Oh I’m just tired now, I’m just not going to predict so well anymore, it’s time for a break. I need some more electricity.’ People have known about these ideas for a while now; that paper was published in 1974. The ideas have filtered through to some extent and there’s a really good book about this by Michael Lewis called The Undoing Project, which some of this is borrowed from, but these ideas aren’t accepted that much.
Organisations don’t act on this information — take Cambridge admissions. There’s a huge amount of evidence that interviewing people in person is actually bad — it may actually be bad for your decision-making and hiring. In person, you’re overwhelmed by people being charming or not charming. If people are really charming you’ll want to hire them, but it actually has very little correlation with their ability to the job. In fact, it’s probably more important to do interviews because people like to be interviewed — it turns out people feel they’re not valued in the hiring process if they’re not interviewed — than as a way to discover anything.
It’s even worse with big people, by the way; we tend to be charmed even more by very tall people. They’re big so we want to be nice to them. The thing is, people don’t accept these things. There are several reasons. Firstly, people feel threatened. There’s a sense of, ‘How can a machine be better than a human?’ It can cause people to feel threatened in their job, which is an important fact to think about when you’re talking with people about this. There can also be a simple sense of, ‘Oh, it can’t be so.’
So to summarise, data driven algorithms are surprisingly good. They often perform better than human experts in areas ranging from forecasting outcomes of football games to diagnosing cancer. The reason is that humans use approximate heuristics and are not well adapted to solving these types of problems. Human beings are extraordinary in many ways, but we’re just not really good at this. I mean, the simplest example I can think of is if you go to a supermarket, you don’t say, ‘Oh, I reckon this is £35 of shopping’ and the cashier doesn’t say, ‘No, I think it’s £27.’ No — you have a machine that adds it up. Humans are not good at adding up 200 five digit numbers in their heads. It’s just not something we’re well adapted for. However, we remain reluctant to accept these facts.
At the same time, deriving value from data is surprisingly hard and more data isn’t always better
I want to turn now to the importance of smart data. The basic story is we want to do more data-driven decision-making, but we also want to be smart about doing more of it. I’ll tell you about another story to illustrate this. Does anyone here read Borges? He writes really beautiful short stories. They’re very short. You can read them over lunchtime. He actually got a Nobel Prize for Literature as well, so he’s good. He wrote a story called The Library of Babel. The library in the story is made up of hexagonal rooms full of books. Every room opens north, south, east and west and the library also goes up and down. As far as anyone can tell, the library goes on forever. You can keep wandering through the rooms and you never get to the end of it. About ten years before the story opens, the librarians in the library have had an incredible breakthrough; they’ve discovered the books are not a varying length — they’re all 410 pages exactly. Every 410 page book that you could possibly write is in that library.
This means the library’s not infinite, which is a very good thing. At first the librarians are overjoyed, because this means that inside the library is the answer to every question. Is there a god? What is the purpose of life? What is the best thing to eat for breakfast? All those questions are answered in a book somewhere in the library. It’s incredible, right? Every question we can have is answered there. But then suddenly, just around the time of the story opens, a very sad thought has come into existence. If the library contains every truth, it must also contain every falsehood, and it also contains every approximate truth; just as it has every true book, it has every book which is a little bit true, but not quite. There’s also no way to tell the difference. So despair has overwhelmed the librarians.
Back to the topic of big data, this is a problem some of your clients may face; they thought that more data was better, whereas of course, new challenges actually arise when you have more data. The more data you have, the more false correlations you have — the easier it is to get things you think are predictions, but aren’t. So the point of this story is that you want to be selective. You don’t want to just go fishing anywhere; you want to have some purpose in your fishing. Yes, there is this great data sea out there, but you want to know where the good fishing grounds are, you want to know where the kind of fish you’re looking for are. You want relevant data, not just big data, you want the data that actually addresses the problems you’re trying to solve.
Secondly, it’s important for data to be reliable. Last September it came out that Facebook had misestimated a key video metric for two years. I’m involved in a company called Datopian and one of my colleagues there has worked at Facebook. His take on it is that Facebook were not just being dumb in making this mistake. The problem is that going from clicks on a video on the Facebook page to this metric on the advertisers page involves about 600 different SQL query steps and different databases. Facebook has bought every technology, perhaps they haven’t bought Black Swan yet, but they seem to have bought almost every technology under the sun at some point, and most of them are in operation somewhere. So it’s a complete zoo.
This raises questions about the importance of reliability. What was the impact of Facebook making that mistake? To some extent that kind of data engineering problem tarnishes a brand, diminishes trust and has significant internal costs. So you want reliable data. But how do you know, when you see that metric or that prediction, that you’re looking at what you think you’re looking at? In this case, it turned out that people thought they were only looking at video views of over three seconds, but they were looking at all video views. Any of you who’ve ever tried to communicate the design of a software problem to another coder, or to any other person, will know how hard it is to be precise and how easy it is for them to misunderstand what you’re talking about. This is just going to be more of an issue as we have more automated prediction. How do we actually know we’re predicting what we think we’re predicting? How do we know that the data we’re looking at is actually the correct data, not stale data, or data accidentally produced by an error? So data quality will be very important.
So data-driven decision-making offers an incredible opportunity, but here is where I would say just curb your enthusiasm a little bit before you rush out and order your next big data toolkit. You do want to order one, but it can be harder to derive value from data than you might think. It’s neither simple nor the solution for everything. I think this should be part of Black Swan’s pitch; that you offer an approach to data which is smart. Because it’s not just doing data, it’s doing it effectively. It’s making sure you get ROI. It’s being intelligent about the data we use. It’s being selective, not trying to solve everything. It’s knowing where to fish. Those are all quite important stories.
I imagine the client base you engage with increasingly has some mixed experiences with data solutions; maybe they bought a data warehouse that’s sitting there in the desert, pristine and beautiful, costing tonnes of dollars a year and producing absolutely zero business value. People will be having that experience sometimes. And that’s actually an opportunity to say, ‘You’re absolutely right. It takes skill and expertise to derive real business value for you from this. And we know how to do that. We’ve got all this experience and we know how to be smart in how we use this data.’ I sometimes talk about this in terms of the three R’s; the right data for the right problem at the right time.
Insight and value can only be derived from data when directed by and combined with human intelligence
I want to end with a key point, which is to remind you to bear in mind that decision-makers will often face a fear that deep down, humans are going to be supplanted by machines. They may even worry that their job specifically is under threat. So I want to emphasise a crucial point, which is that it was human doctors, and decades of research, that actually identified those seven features to look at when diagnosing stomach cancer. It wasn’t an algorithm — you had to know what to go looking for and that took human intelligence. It took humans or human-built processes to collect that data. You still have to build and design processes that collect the information you want and you still have to design the algorithms. It also takes humans to act on those predictions and insights.
I’ll end with one final story to illustrate this. Does everyone know what an ATM is? Does anyone know what ATM stands for? Automated Teller Machine. Now, does anyone know what a teller is? It’s a bank clerk; someone who, before ATMs started being used in the 70s, would count out $50 and debit it from your account when you went into the bank. This was a very large profession; hundreds of thousands of people were bank tellers. Now, since the introduction of ATMs, you might anticipate that employment in the area has gone down. Well, in fact, it’s actually gone up, because those bank clerks moved into other value adding activities. They stopped handing out as much money — which is a very low value exercise — and moved into advising about loans or mortgages. So, at least if you look at the data from the Census Bureau in the United States, that category of employment has actually gone up, even though a machine can now do an entire aspect of the work.
In general there’s a huge opportunity to actually generate more work and more activity by using this insight and information. So I would like to conclude with this suggestion, which is that what we’re looking at here is the complementarity between man and machine. We can only derive insight and value from data when it’s directed by and combined with human intelligence. In essence, that’s what Black Swan are up to here: you’re augmenting and supporting the incredible human intelligence in the organisations you’re working with, using technology that can automate the things that humans are not so good at, that can deal with our heuristics and biases to support good decision-making. Because ultimately, it’s humans who are going to have to act on whatever the prediction is, or make a decision to act on it. Thank you very much.