The Little Vulgar Book of Mechanics (v0.17.0) - Probability I

Last updated: June 11th 2022

Just updated this section of the book: Probability I

Probability I #

"We have to come back to something like ordinary language after all when we want to talk "about" mathematics!" – Sir Harold Jeffreys (1891–1989)

Probability can be confusing because everyone – including myself, but also your school teacher, your textbook, and even your favorite "probability rockstar" in pop-sci circles – talks about probabilities using phrasing that gives many people the wrong idea, even in "technical" contexts.

I have three goals for this introduction:

  1. Tell you why Probability exists.
  2. Give you a couple of examples of why loose language can confuse people.
  3. Clarify what one actually means when one talks about probabilities.

There will be no function graphs, no numbers, no sets, or combinatorics talk in this section. All of those things will enter the picture in the following sections. Though I will introduce some basic symbols (without the numbers) at the very end.

To understand why Probability exists, let's first see a case where we don't use it. Consider the following proposition:

"John Petrucci owns a 7-string electric guitar."

For most practical purposes, that proposition is going to be either False or True, and that'll be the end of it. In a computer program, a "boolean" data type would suffice to describe the state of knowledge about it.

But sometimes we need more than True or False. Consider this question:

  1. Was the UFO reported by the geologist at the Antartica research station an alien spaceship?

When confronted with such questions, we usually don't have enough information to be able to say either True or False, so we want something more granular, which we can use while in the process of discovering the truth – ie. the process of arriving at either True or False (at least temporarily). Something to use as we accumulate/improve data from observations, experience, etc.

And that is why Probability exists: To represent, with mathematical and logical rigour, the different stages of what we might call "truth discovery," which in common parlance we express with phrases such as:

...and so on.

Since True and False are not useful enough as values for such purposes, we use numbers between 0 and 1 (never actually 0 or 1, cos they're just equivalent of good ol' False and True respectively), along with certain operations to engaging in "probabilistic" reasoning and deductions. In this sense, Probability is an extension of Logic.

We also follow strict rules for calculation, so in the study of Probability there's also a calculus to be learned – in fact, a century ago French treatises would talk about "Calculus of Probabilities" instead of "Probability Theory" as we call it nowadays – which we will study later on.

(The big c, Calculus, will also enter the picture later, but in the previous paragraph I'm talking about Probability Theory having a "calculus" too. A calculus is any set of rules for calculation in some context. E.g. Propositional calculus is the system that specifies how to make inferences in Logic.)

Of course, it's not about pulling arbitrary numbers out of your ass to express your feelings about some guesses you have. Ie. you don't just pick 0.416 out of the blue to express how likely you think the UFO from our Antartica researcher was an alien spaceship. You must explain why 0.416 and not, say 0.415. So we will learn that probabilities are derived from data (or, at very least, from some extremely common, common sense, near-truth/near-false, "for all practical purposes"-type assumptions).

(Spoiler alert: Having methods and algorithms to count things is going to help a lot. Also: There Will Be Fractions. And numerators and denominators in said fractions will be based on counted things. And from such fractions, and operations on them, you'll arrive at such specific values such as 0.416.)

OK that's enough of why we use Probability. Now let's talk about the shit we all say irresponsibly. The loose way of talking about probabilities which can and does confuse others, and even ourselves.

Consider this:

"The object encountered by the geologist has a probability of 1 in 55000 of being an alien spaceship."

Do you see anything peculiar about that proposition?

I know what you're thinking: It is a "probabilistic" proposition. Yes, it's meant to be. But here's what's "peculiar" about it: It doesn't make sense!

Can you see why? Let see if it gets better if I rephrase it this way:

  1. "The geologist has a chance of 1 in 55000 of having seen an alien spaceship."

What do you think? Does it make sense now? Let's put the two next to each other:

  1. "The object encountered by the geologist has a probability of 1 in 55000 of being an alien spaceship."
  2. "The geologist has a chance of 1 in 55000 of seeing an alien spaceship tomorrow."

Which one do you think is more accurate?

The right answer is: Neither! Both propositions are examples of the wrong speak I'm talking about. Both make the mistake of talking about some mythical substance. Who "has" the probability? The geologist? The UFO? The answer is neither. Because here's the thing: There is no such thing as a probability.

By which I mean: Nobody "has" a probability. No object does. No event does. Both examples above are trying to say the same idea, but they're being loose with language in the same way: They seem to say that a probability is somehow a property, or attribute, of a person, or some UFO, or event.

Another example:

"There is a 30% probability that Zander Noriega's next song is good."

Well, my next song doesn't even exist, so that "30% probability" certainly can't be a property of it. An entity that doesn't exist can't have any property.

Which leads us to the second lesson in this introduction: Probability is not a property of anything. A probability is a measure of an observer's uncertainty.

That's good news, though. I mean, what would you prefer?

  1. Probability as a mystical substance somehow reified, floating around, being embedded in objects, people, events, as an imaginary property.
  2. Probability as a rigorously computed number, reflecting someone's degree of uncertainty with respect to some proposition about the world.

Call me crazy, but I'm glad we got the latter. No mysticism here (though plenty of belief all over the place, as I'll explain next.) But you can see why loose language can confuse people into thinking Probability is a mystical substance. Here's one last example, coming from a guy who constantly reminds everyone that he is a master probabilist:

"...the risk of being killed as a pedestrian is one per 47,000 years."

See his use of "the" risk. Implying there's "the" probability of being killed as a pedestrian. But now you know that that's not a thing: No things have "the" probability of this and that. No cars, no pedestrians, no drivers, or roads. "The" probability is not a property of anything in spacetime.

What he means, or, what you should derive from his babble, is that, according to some data that he (presumably) has reviewed, plus his experience and so on, the quantification of his degree of belief in anyone getting killed by a car as a pedestrian is "1 in 47,000 years." And I'm sure somewhere in the calculation there's some total of street crossings per capita, per year, etc. And some "Exponentials," of course.

But what if this month you are living at a particularly busy and chaotic urban area, and have to cross particularly busy intersections, on your way to work, and it's a month with particularly extreme weather, with slippery pavement and foggy vision? Is "the" probability of getting wrecked still "1 in 47,000"? Is the number provided by Mr. Probability Guru of any use to you?

Probably not. (See what I did there?). I mean, you could take his number, act on it as if it was "data," and/or go around regurgitating it. But there's no such thing as "the" probability of being killed as a pedestrian, or "the" probability of dying in a plane crash, or "the" probability of a UFO encounter.

There's only a calculation you can do, based on some data, leading to a number that will reflect your level of uncertainty. Other people might then just regurgitate your number, as if it was "data," but that's a different thing from your number being "the" probability, as some kind of property of the fabric of the Universe or whatever. Never ever forget: A probability is a number expressing someone's degree of belief in something.

Speaking of, let me finish this introduction with a word on "belief." A probability is either:

  1. Our degree of belief in a hypothesis H (from "hypothesis"), given some data D.
  2. Our degree of belief in data D, given our belief in a hypothesis H.

Respectively expressed symbolically:

\[P(H \mid D)\]

\[P(D \mid H)\]

It's all about belief. Yes, belief. Not "facts." Facts are True and Right stuff. Which is handy, but, whether you like it or not, almost all of your actions are based on belief. Aside from mathematical theorems, "facts" are a minuscule part of your life. You don't really know much of anything for a fact.

You have very little idea of what's gonna happen in your day after you wake up (if you wake up). All kinds of substances, physical and mental illnesses mess with your perceptions and memories. Software and hardware bugs confuse your monitoring devices. Not to mention personal biases, fears, life goals, peer pressure, group think, and emotions in general. And that's not even taking into account that you might also just be dumb as a fence post.

Nonetheless, you still build things. And so do I. That's literally all I do, all day, every day: Create things. Engineer things. From music to software to meals. So we still need to at least compare the relative (un)certainties with regards to various possible beliefs. Probabilities encode our ever-changing incompleteness of information, and Probability Theory provides the logic and calculus for operating with them.

Here's a last quote by one of the biggest minds in history:

"The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful, none of which (fortunately) we have to reason on. Therefore the true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man's mind." – James Clerk Maxwell (1850)

Let this be your welcome to Probability I.

In the following sections, we'll be looking at examples from various areas from the rest of the book.

Books #

See the rest of the book's WIP here.

Related