Probably Means Something Different to Everyone

There is a sentence you have probably said this week without thinking too hard about it. Something like "there is a good chance" or "that seems unlikely." The other person nodded. You moved on. Everyone felt understood.

Here is the problem with that. That nod does not mean agreement. It just means neither of you stopped to check.

Wade Fagen-Ulmschneider at the University of Illinois ran a survey asking 123 people to put a number between 0 and 100 on seventeen common probability phrases. The results are not just interesting. They are a little unsettling.

The survey that started this

The setup is beautifully simple. You see a phrase like "highly likely." You type in whatever percentage you think it represents. That is it.

What you get when you look at the full dataset is a picture of how wildly people diverge, even for phrases they use every day without hesitation. The chart below shows the real numbers from those 123 responses. Each row is one phrase. The box covers where the middle 50% of responses landed. The tick marks the median.

Interact:Hover any dot to see one person's response across every phrase.

Phrases that do their job

A handful of phrases work pretty well as signals. "About even" is almost universal at 50%. Nobody needs to think about it because everybody has the same reference point: a coin. There is nothing to interpret. "Better than even" is similarly tight, sitting in the mid-50s with a narrow spread.

These phrases anchor to something concrete. They carry shared meaning because they have an implicit reference that almost everyone uses the same way.

Phrases that carry almost no information

Then there are phrases like "we believe," "we doubt," and "unlikely." These sit inside boxes that span 20 percentage points or more in the middle half of responses alone. The outer tails are worse.

When you say "we doubt this will happen," one reader hears something like 10% probable. Another hears something like 30% probable. Both of them would claim they understood you. Neither of them is wrong about what the phrase feels like. They just resolved the ambiguity in different directions.

The phrase is not carrying probability. It is carrying direction and vague emotional weight. That is useful sometimes. But it is not the same as communication.

Sherman Kent and the CIA's sixty-year-old problem

This is not a new problem. It has been documented carefully since at least 1951.

That year, a NATO planning document described the likelihood of a Soviet attack on Yugoslavia as "a serious possibility." After the document circulated, a researcher went back and asked the officers who had read and signed off on it what probability they had taken "serious possibility" to mean. The answers ranged from 20% to 80%.

That is a 60-point gap on a document assessing the likelihood of a military attack on an allied country.

Sherman Kent, a senior CIA analyst, saw this problem clearly and spent years trying to fix it. In a 1964 paper titled "Words of Estimative Probability" published in the journal Studies in Intelligence, he proposed a standardized scale. Every phrase would have a defined numerical range. "Highly likely" would mean 87 to 93%. "Probable" would mean 63 to 87%. "Unlikely" would mean 13 to 37%. Analysts across every report would use the same vocabulary.

The CIA declined to formally adopt it. The objections were that numbers implied precision that intelligence estimates could not honestly claim, that different contexts called for different calibrations, and that analysts simply did not want the constraint. Kent published his frustration about this in the same journal a few years later. The problem he identified never went away.

What is striking about Fagen-Ulmschneider's 2024 survey is how stable the pattern has remained across all that time. The phrases cluster in nearly the same positions as they did in Kent's era. A 2015 Reddit survey by /u/zonination found the same structure, attracting tens of thousands of engagements because so many people recognized it immediately from their own experience. The disagreement is not a quirk of one dataset. It is something about how human language handles uncertainty.

The same gap shows up in medicine

If you thought this was only a problem for intelligence analysts, it turns out medicine has the same issue in a context where the stakes are very immediate and personal.

What "rare" actually means

Philip O'Brien's 1989 analysis of probability language among medical professionals found nearly identical results to what Kent had documented in intelligence assessments. Physicians interpreting phrases like "likely" mapped them to a range from 55% to 90% depending on who was interpreting and in what context. "Frequently" and "rarely" had similar ranges.

This matters for patients in a specific and underappreciated way. When a doctor tells you that a side effect is "rare," what do you hear? Something like one in a million, probably. Something extremely uncommon, maybe so uncommon it barely warrants consideration.

The FDA has a formal definition of "rare" for adverse event reporting: between 1 in 1,000 and 1 in 10,000. "Uncommon" means 1 in 100 to 1 in 1,000. "Very rare" is less than 1 in 10,000. Most patients do not know this. Studies have consistently shown that patients presented with a verbal label ("rare") make different treatment decisions than patients given the equivalent percentage, even when those descriptions refer to the same underlying probability. The word and the number are supposed to convey the same information, but they do not land the same way.

When a physician says "there is a small chance of this complication," both of them feel like communication happened. The patient's version of "small chance" and the physician's version may not overlap at all.

How climate science handled the problem

The Intergovernmental Panel on Climate Change went through a long and somewhat painful process of learning exactly this lesson at global scale.

For years, IPCC assessment reports used natural language to describe the likelihood of various climate scenarios. Journalists, policymakers, and readers would interpret those phrases and use them to justify or dismiss action. The problem was that "likely" in an IPCC report and "likely" in a newspaper editorial were not the same thing. Readers were free to resolve the ambiguity in whatever direction felt comfortable.

Starting with the Third Assessment Report in 2001, the IPCC began publishing an explicit calibration scale alongside their reports. They now define it clearly: "virtually certain" means 99 to 100% probability. "Very likely" means 90 to 100%. "Likely" means 66 to 100%. "About as likely as not" covers 33 to 66%. "Unlikely" goes from 0 to 33%. "Exceptionally unlikely" means 0 to 1%.

This scale appears in every report. Every use of those terms is mapped to one of those defined ranges. They made their vocabulary into a system.

And yet studies of how journalists and policymakers actually interpret IPCC language still find gaps between intended and received meaning. Defining the scale helps. It does not close the problem entirely, especially for readers who encounter the language without reading the definitions first, which is most readers most of the time.

Why everyone rounds and what that reveals

The Fagen-Ulmschneider dataset has one additional detail worth sitting with. Of all the numerical estimates submitted in the survey, 85.8% ended in either 0 or 5. When someone is asked to put a probability on "probably," they do not say 68%. They say 70%. Or 65%. Whichever round number is nearest to wherever their intuition landed.

This is revealing about what is actually happening when people answer these questions. They are not reporting a carefully calibrated estimate. They are converting a fuzzy feeling into the nearest comfortable landmark. Their actual intuition might be anywhere from 63% to 72%, and it comes out as 70% because that is the nearest round number.

Now layer that on top of the interpersonal variation. Person A says "probably" and means something they would round to 70%. Person B hears "probably" and maps it to something they would round to 80%. If you asked both of them whether they understood each other, they would say yes. If you asked them to put numbers on it, you would see the gap.

A useful check before you write

Before using a probability phrase in something that matters, ask: if ten different people read this, what range of probabilities would they infer? If that range spans more than 15 percentage points, the phrase is carrying less signal than you probably think.

Why switching to numbers is not a complete fix

The obvious response to all of this is to just use percentages. Say "there is roughly a 20% chance" instead of "unlikely." Stop hiding behind vague language.

This is better than the alternative in most cases, but it is not a clean solution either.

Specific numbers imply precision that the underlying estimate often cannot honestly claim. A weather forecast that says "23% chance of rain" sounds more reliable than one that says "around 25%," but the model producing it is not actually calibrated to single-digit accuracy. Presenting the number as 23% rather than "around a one-in-four chance" can give the audience a false sense of the estimate's trustworthiness. You swap one kind of imprecision for a different kind of false precision.

There is also solid research in risk communication showing that verbal labels and percentages trigger different reasoning styles in different people. Some audiences engage more carefully with verbal language. Others respond better to numbers. Neither format prevents motivated reasoning. If someone is predisposed to downplay a risk, they will interpret "23%" as "not even one in four, so probably nothing to worry about" just as readily as they interpret "unlikely" to mean the same thing. The format changes. The behavior does not necessarily follow.

What actually helps

There is no elegant solution here. The problem is partly linguistic (probability vocabulary was never designed to be precise), partly cognitive (people round, anchor to prior expectations, and interpret vague language through their own priors), and partly social (speakers and listeners share the feeling of mutual understanding without checking whether it is real).

What consistently works in contexts where precision matters is what the IPCC did: define your terms explicitly, at the start, and then hold to those definitions throughout. If your document uses "likely" to mean above 70%, say so once in the introduction. If your risk assessment uses "probable" to mean the outcome would surprise you if it did not happen, say that. This is not elegant prose. It feels bureaucratic. But it closes the gap between what the writer intends and what the reader receives.

The deeper issue is the assumption that sits underneath all of this. Most of the time when we use probability phrases, we assume the other person is reading them at roughly the same probability we intended. The data going back to Sherman Kent's 1951 NATO example says that assumption is wrong more often than we notice.

The phrases feel like communication. Most of the time, they are.

Sometimes they are not, and the gap only shows up when a decision gets made and something goes wrong and everyone discovers they had been using the same words to mean different things all along.

The raw survey data is from Wade Fagen-Ulmschneider's Perception of Probability Words project at the University of Illinois. Sherman Kent's original paper is available through the CIA's FOIA reading room.