13 Mental Models From Statistics

Table of Contents

1. Nash Equilibrium

Image result for beautiful mind bar — The Bar Scene from ‘A Beautiful Mind’

Each person in a group makes the optimal decision for himself, based on what he thinks others will do. And nobody can improve their choice by changing it.

In the Movie ‘A Beautiful Mind’, John Nash played by Russell Crowe tries to explain game theory differently. He is at a bar with three friends, and they are all enchanted by a beautiful blond who walks in with four brunette friends.

His friends banter about which of them will be able to woo the blonde, but Dr. Nash concludes that they should ignore her instead. “If we all go for the blonde, we block each other and not a single of one of us is going to get her. So then we go for her friends, but they will all give us the cold shoulder because nobody likes to be second choice. But what if no one goes to the blonde? We don’t get in each other’s way and we don’t insult the other girls. That’s the only way we win.”

This example is a loose example of how the Nash equilibrium works, but a better example is the classic: The Prisoner’s Dilemma. It goes as follows: two accomplices are locked in separate cells. Each is offered three choices by police:

If both confess, both will be jailed for five years.
If only one confesses, he is freed but his friend goes to jail for ten years.
If neither confesses, both will be charged for a minor offense and will be jailed for two years.

Confessing is the best option for each prisoner, because if one of them confesses, then he is either freed or gets 5 years at worst. But If he doesn’t confess, he gets either 2 or 10 years in jail.

2. Permutations and Combinations

Permutations and Combinations help us think about how things should be ordered.

3. Bell Curve/Normal Distribution

Many things follow a normal distribution, in which a bell curve represents the distribution of values. You have a concentration of values in the middle, and much less values on the extremes. Human height and weight fit a normal distribution, but wealth does not (most wealth is concentrated in the extremes).

4. Law of Large Numbers

In probability, the more an event occurs, the more the results will converge to the expectation. Casinos make money because of the Law of Large Numbers. Even if they lose money in the short run, after enough trials, the expected probability will emerge as the result and they will profit.

5. Regression to the Mean

In a normal distribution, large deviations from the average will return to the average as the numbers of observations increase – this follows from the Law of Large Numbers. Failure to understand this law may fool us into seeing correlations where they don’t exist. A basketball player who hits his first 20 free throws before missing his next 10 (right after he changed the color of his headband) is not suddenly jinxed.

6. Power Laws

Anything that moves exponentially rather than linearly succumbs to Power Laws. A Pareto distribution is an example of a power law. The Richter scale follows a power-law distribution scale. A 9 is 10x more destructive than an 8, and an 8 is 10x more destructive than a 7.

Power laws include: diminishing returns: , fat-tailed distribution , long tail.

7. Pareto Principle:

Pattern of nature in which most of the effects come from a small number of causes, or most of the rewards go to a minority of players. This is known as the 80/20 rule. But this is just metaphorical. Over 97 percent of the book sales in the U.S are made by 20 percent of the authors.

“For all those who have, more will be given, and they will have an abundance; but from those who have nothing, even what they have will be taken away.” – The Mathew Effect

8. Pareto Efficiency

When it is impossible to allocate resources to make one person better off without making at least one person worse off.

9. Ergodicity

An ensemble probability is different from a one-time probability. Consider the two examples.

Example 1: A die rolled 1000 times has equal probabilities to 1000 dice rolled once; rolling a die is “ergodic”. But if the die gets chipped after 100 throws so it’s likelier to roll 3, then 1 die rolled 100 times is not equivalent to 100 dice rolled once (non-ergodic). It is a mistake to treat non-ergodic systems as ergodic.

Example 2: Taleb gives the example of the Casino.

Imagine that on a given day, there were a 100 people who played roulette. We can safely assume that around 1 percent of the players will make money. We can also assume that player number 27 going bust will not affect player 28’s chances. But for a single player, this is not true. If he goes bust after playing 27 games, it’s the end of the road for him.

The long-run returns of a market are a bad indicator of what a single investor should be doing. That is why investment recommendations by your investment guru or local bank can be dangerous when they reassure you that on average, the long term returns of a market trend upwards.

10. The Monte Carlo Fallacy (The Gambler’s Fallacy)

Image result for monte carlo casino roulette

Just because something deviates from the average, does not make it more likely that it will return to the average on the next try. This is different from the Law of Large numbers. Consider the Casino again, if you play roulette, you should expect that “black” and “red” numbers should come up equally after enough spins of the wheel. But “enough” is not 10,20, or even 30, it’s hundreds or thousands of times.

At the Monte Carlo Casino on August 18, 1913, the ball fell on black 26 times in a row. This is statistically an extremely rare event, but it is not rarer than any other the other 67,108,863 possible sequences of red and black. Gamblers lost millions of francs because they falsely believed that the next number would be red, because there were too many black numbers in a row. The problem is that they underestimated what constitutes a “large number.”

11. Local vs Global Optimum

A local optimum is a solution that is optimal within a set of solutions. In contrast, a global optimum is the optimal solution among all possible solutions. A person who puts a premium on free thinking is faced between two choices: getting a PhD or being an employee, would select the first option since he would be afforded more time to think freely. This is the local optimum.

But the best possible solution may be something completely different, such as starting his own business. That would be the global optimum.

12. Simpson’s Paradox

A paradox in probability and statistics, where a trend appears in different groups of data, but disappears when these groups are combined.

In 1973, admission figures of the University of California, Berkeley showed that significantly more men were being selected than women. This was evidence of clear bias in favor of men. But when all 85 departments were taken into consideration, it appeared that 6 were biased against men whereas 4 were biased against women.

13. Bayes’ Theorem

Knowledge of the conditions related to an event can be used for a better estimation of that event. On average, the height of men in Fake Country is 5 feet 7 inches. If you wanted to estimate what the height of a boy being born in Fake Country today would be, given that you knew that both parents were taller than 6 feet 4 inches, you would not say 5 feet 7 inches.