How to Win at Forecasting

How do people react when they’re actually confronted with error? You get a huge range of reactions. Some people just don’t have any problem saying, “I was wrong. I need to rethink this or that assumption.” Generally, people don’t like to rethink really basic assumptions. They prefer to say, “Well, I was wrong about how good Romney’s get-out-the-vote effort was.” They prefer to tinker with the margins of their belief system (e.g., “I fundamentally misread U.S. domestic politics, my core area of expertise”).

There is, of course, no evidence to support that claim. I would argue that making people more appropriately humble about their ability to predict a short-term future is probably, on balance, going to make them more appropriately humble about their ability to predict the long-term future, but that certainly is a line of argument that’s been raised about the tournament.

Another interesting variant of that argument is that it’s possible to learn in certain types of tasks, but not in other types of tasks. It’s possible to learn, for example, how to be a better poker player. Nate Silver could learn to be a really good poker player. Hedge fund managers tend to be really good poker players, probably because it’s good preparation for their job. Well, what does it mean to be a good poker player? You learn to be a good poker player because you get repeated clear feedback and you have a well-defined sampling universe from which the cards are being drawn. You can actually learn to make reasonable probability estimates about the likelihood of various types of hands materializing in poker.

Is world politics like a poker game? This is what, in a sense, we are exploring in the IARPA forecasting tournament. You can make a good case that history is different and poses unique challenges. This is an empirical question of whether people can learn to become better at these types of tasks. We now have a significant amount of evidence on this, and the evidence is that people can learn to become better. It’s a slow process. It requires a lot of hard work, but some of our forecasters have really risen to the challenge in a remarkable way and are generating forecasts that are far more accurate than I would have ever supposed possible from past research in this area

Forecasters who were more modest about what could be accomplished predictably were actually generating more accurate predictions than forecasters who were more confident about what could be achieved. We called these theoretically confident forecasters “hedgehogs.” We called these more modest, self-critical forecasters “foxes,” drawing on Isaiah Berlin’s famous essay “The Hedgehog and the Fox.”

I don’t have a dog in this theoretical fight. There’s one school of thought that puts a lot of emphasis on the advantages of “blink,” on the advantages of going with your gut. There’s another school of thought that puts a lot of emphasis on the value of system-two overrides, self-critical cognition—giving things over a second thought. For me it is really a straightforward empirical question of, what are the conditions under which each style of thinking works better or worse?

In our work on expert political judgment we have generally had a hard time finding support for the usefulness of fast and frugal simple heuristics. It’s generally the case that forecasters who are more thoughtful and self-critical do a better job of attaching accurate probability estimates to possible futures. I’m sure there are situations when going with a blink may well be a good idea, and I’m sure there are situations when we don’t have time to think. When you think there might be a tiger in the jungle, you might want to move very fast, before you fully process the information. That’s all well-known and discussed elsewhere. For us, we’re finding more evidence for the value of thoughtful system-two overrides, to use Danny Kahneman’s terminology.ations when going with a blink may well be a good idea, and I’m sure there are situations when we don’t have time to think. When you think there might be a tiger in the jungle, you might want to move very fast, before you fully process the information. That’s all well-known and discussed elsewhere. For us, we’re finding more evidence for the value of thoughtful system-two overrides, to use Danny Kahneman’s terminology.

The question becomes, is it possible to set up a system for learning from history that’s not simply programmed to avoid the most recent mistake in a very simple, mechanistic fashion? Is it possible to set up a system for learning from history that actually learns in our sophisticated way that manages to bring down both false positives and false negatives to some degree? That’s a big question mark.

There are very few people on the planet, I suspect, who believe that to be true of our world. But you don’t have to go all the way to the cloudlike extreme and say that we are all just radically unpredictable. Most of us are somewhere in between clocklike and cloudlike, but we don’t know for sure where we are in that distribution, and IARPA is helping us to figure out where we are.

It’s fascinating to me that there is a steady public appetite for books that highlight the feasibility of prediction like Nate Silver, and there’s a deep public appetite for books like Nassim Taleb’s The Black Swan, which highlights the apparent unpredictability of our universe. The truth is somewhere in between, and IARPA-style tournaments are a method of figuring out roughly where we are in that conceptual space at the moment, with the caveat that things can always change suddenly.

Hedgehogs are more likely to embrace fast and frugal heuristics that are in the spirit of blink. If you have a hedgehog-like framework, you’re more likely to think that people who have mastered that framework should be able to diagnose situations quite quickly and reach conclusions quite confidently. Those things tend to co-vary with each other.

For example, if you have a generic theory of world politics known as “realism” and you believe that when there’s a dominant power being threatened by a rising power, say the United States being threatened by China, it’s inevitable that those two countries will come to blows in some fashion—if you believe that, then blink will come more naturally to you as a forecasting strategy.

If you’re a fox and you believe there’s some truth to the generalization that rising powers and hegemons tend to come into conflict with each other, but there are lots of other factors in play in the current geopolitical environment that make it less likely that China and the United States will come into conflict—that doesn’t allow blink anymore, does it? It leads to “on the one hand, and on the other” patterns of reasoning—and you’ve got to strike some kind of integrative resolution of the conflicting arguments.

That carries us over into the wisdom-of-the-crowd argument—the famous Francis Galton country fair episode in which the average of 500 or 600 fairgoers make a prediction about the weight of an ox. I forget the exact numbers, but let’s say the estimated average prediction was 1,100. The individual predictions were anywhere from 300 to 14,000. When we trim outliers and average, it came to 1,103, and the true answer was 1,102. The average was more accurate than all of the individuals from whom the average was derived. I haven’t got all the details right there, but that’s a stylized representation of the aggregation argument.

In Moneyball algorithms destabilized the status hierarchy. You remember in the movie, there was this nerdy kid amid the seasoned older baseball scouts, and the nerdy kid was more accurate than the seasoned baseball scouts. It created a lot of friction there.This is a recurring theme in the psychological literature— the tension between human-based forecasting and machine or algorithm-based forecasting. It goes back to 1954. Paul Meehl wrote on clinical versus actuarial prediction in which clinical psychologists and psychiatrists’ predictions were being compared to various algorithms. Over the last 58 years there have been hundreds of studies done comparing human-based prediction to algorithm- or machine-based prediction, and the track record doesn’t look good for people. People just keep getting their butts kicked over and over again.

Source: The New Science, Phillip Tetlock

Related posts:

"A gilded No is more satisfactory than a dry yes" - GracianCancel reply