Introduction#

Probability is the study of the properties of random events.

Preliminaries#

Compound Union#

Symbolic Expression

\bigcup\limits_{i=1}^{n} A_{i} = A_1 \cup A_2 \cup ... \cup A_{n-1} \cup A_n

A symbol that represents the union of a sequence of sets.

Example

Let A, B, C and D be sets given by,

A_1 = \{ a, b, c \}

A_2 = \{ b, c, d \}

A_3 = \{ c, d, e \}

A_4 = \{ d, e, f \}

Then,

\bigcup\limits_{i=1}^{4} A_{i} = A_1 \cup A_2 \cup A_3 \cup A_4

= \{ a, b, c, d, e, f \}

Summation
Symbolic Expression

\sum_{i=1}^n x_i = x_1 + x_2 + ... x_{n-1} + x_n

Sometimes written as,

\sum_{x_i \in B} x_i

Where B is a set of elements.

A symbol that represents the sum of elements x_i.

Example

Let the set A be given by,

A = \{ 1, 2, 3, 4, 5 \}

Then,

\sum_{x_i \in A} x_i = 1 + 2 + 3 + 4 + 5 = 15

Note

The sum \sum is only defined if the set it is summing contains only numerical elements. It makes no sense to take about the sum of elements with a set like,

A = \{ \text{ novels }, \text{ textbooks }, \text{ magazines } \}

Definitions#

Experiment

An uncertain event.

Mutual Exclusivity

A \cap B = \varnothing \implies \text{ A and B are mutually exclusive.}

Two sets, A and B, are mutually exclusive if they are disjoint.

Outcomes

x, y, z (lower case letters)

A possible way an experiment might occur.

Sample Space

S

The set of all possible outcomes for an experiment.

Note

The sample space is simply the Universal Set in probability’s Domain of Discourse.

Events

A, B, C (upper-case letters)

A_1, A_2, A_3, ..., A_{n-1}, A_n (upper-case letters with subscripts)

A subset of the sample space, i.e. a set of outcomes.

A \subseteq S \implies  \text{ A is an event }

Probability

P(A)

A numerical measure of the likelihood, or chance, that an event A occurs.

Sample Spaces and Events#

The sample space for an experiment is the set of everything that could possibly happen.

Motivation#

Note

By “fair”, we mean the probability of all outcomes are equally likely.

Consider flipping a fair, two-sided coin. The only possible outcomes to this experiment are heads or tails. If we let h represent the outcome of a head for a single flip and t represent the outcome of a tail for a single flip, then the sample space is given by the set S,

S = \{ h, t \}

Events can be defined as subsets of the sample space. If we let H represent the event of a head and if we let T represent the event of a tail, then clearly,

H = \{ h \}

T = \{ t \}

Be careful not to confuse the outcome h with the event H, and likewise the outcome t with the event T. They have different, but related, meanings. The outcomes h and t are individual observables; they are physically measured by flipping a coin and observing on which side it lands. An event, on the other hand, is a set, and sets are abstract collections of individual elements. In this case, the sets are singletons, i.e. the sets H and T only contain one element each, which can lead to confusing the set for the outcome. Let us extend this example further, to put a finer point on this subtlety.

Consider now flipping the same fair, two-sided coin twice. A tree diagram can help visualize the sample space for this experiment. We represent each each flip as a branch in the tree diagram, with each outcome forking the tree,

../../_images/sample_space_coin_flip.png

The outcomes of the sample space are found by tracing each possible path of the tree diagram and then collecting them into a set,

S = \{ hh, ht, th, tt \}

In this example, there is no simple correspondence between the events defined on the sample space and the outcomes within those events, as in the previous example.

Take note, the sequence of outcomes ht is different than the sequence of outcomes th. In the first case, we get a head and then we get a tail. In the second case, we get a head and then we get a tail. Therefore, ht and th represent two different outcomes that correspond to the same event. Let us call that event the set HT. HT represents event of getting one head and one tail, regardless of order. Then, HT has exactly two outcomes (elements),

HT = \{ ht, th \}

When one of the outcomes ht or th is observed, we say the event HT occurs.

It is important to keep in mind the distinction between events and outcomes. The differences are summarized below,

  1. Outcomes are elements. Events are sets.

  2. Outcomes are observed. Events occur.

Compound Events#

A compound event is formed by composing simpler events with Operations.

Example

Consider the experiment of drawing a single card at random from a well-shuffled, standard playing deck. Let A represent the event of drawing a 2. Let B represent the event of drawing a heart.

The meaning of a few different compound events is considered below,

  1. A \cap B This compound event represents the event of getting a 2 of hearts.

  2. A \cup B This compound event represents the event of getting a 2 or a heart.

  3. A^c This compound event represents the event of getting any card except a 2.

  4. A \cap B^c This compond event represents the event of getting a two that is not a heart.

Classical Definition of Probability#

Returning to the experiment of flipping a fair coin once, we have a sample space and two events, H and T, defined on that sample space,

S = \{ h, t \}

H = \{ h \}

T = \{ t \}

The cardinalities of these sets are given by,

n(S) = 2

n(H) = n(T) = 1

A natural way to define probability of an event is as the ratio of the cardinality of that event to the cardinality of the sample space. This leads to the following definition of the probability of event A,

P(A) = \frac{n(A)}{n(S)}

In plain English,

The probability of an event A is the ratio of the number ways A can occur to the number of ways all the outcomes in the sample space S can occur.

Another way of saying the same thing,

The probability of an event A is the ratio of the cardinalities of the set A and the sample space S.

This is called the classical definition of probability.

Applying this definition to the events H and T in the first example, it can be seen to conform to the intuitive notions of probability, namely that equally likely events should have the same probability. Intuitively, if the coin being flipped is fair, the probability of either event H or T should be equal.

P(H) = \frac{n(H)}{n(S)} = \frac{1}{2}

P(T) = \frac{n(T)}{n(S)} = \frac{1}{2}

Axioms of Probability#

The classical definition of probability suffices for a general understanding of probability, but there are cases where it fails to account for every feature we would expect a definition of probability to satisfy.

To see this, consider the experiment of spinning a dial on a clock with radius r,

(INSERT PICTURE)

The dial can land at any point between 0 and the circumference of the clock, {2}{\cdot}{\pi}{\cdot}{r}. Between 0 and {2}{\cdot}{\pi}{\cdot}{r}, there are an infinite number of numbers (0, 0.01, 0.001, 0.001, …, 1, 1.01, 1.001, …, etc., … , {2}{\cdot}{\pi}{\cdot}{r}) ; What is n(S) when the sample space of outcomes is infinitely large? The classical definition of probability is unable to answer this question.

For this reason and other similar cases, the classical definition of probability is not sufficient to completely determine the nature of probability. This leads to the axiomatization of probability, which acts as additional constraints any model of probability must satisfy in order to be considered a probability.

Note

We will see in a subsequent section, when we discuss the uniform distribution, while we cannot calculate the probability of the dial exactly landing on a given number, we can calculate the probability the dial lands within a certain interval (that is to say, a certain arc length of the clock’s circumference).

Axioms#

Axiom 1#

P(A)>=0

All probabilities are positive; No probabilities are negative.

Axiom 2#

P(S)=1

The probability of some outcome from the sample space S occuring is equal to 1.

Axiom 3#

\forall i \neq j: A_i \cap A_j = \varnothing \implies P(\bigcup\limits_{i=1}^{n} A_i) = \sum_{i=1}^n P(A_i)

If each event i A in the sample space S is mutually exclusive with every other event \forall i \neq j: A_i, then the probability of the union of all of these events is equal to the sum of the probabilities of each individual event.

Axiom 1 and Axiom 2 are fairly intuitive and straight-forward in their meaning, while Axiom 3 takes a bit of study to fully appreciate. To help in that endeavor, consider the following example.

Example

Let us return again to the experiment of flipping a fair coin twice. Consider now two different events A and B defined on this sample space,

A \equiv \text{ getting at least one head }

B \equiv \text{ getting exactly one tail }

Find the probability of P(A \cup B).

The sample space S of this experiment was given by,

S = \{ hh, ht, th, tt \}

Then, in terms of outcomes, clearly, these events can be defined as,

A = \{ hh, ht, th \}

n(A) = 3

B = \{ ht, th \}

n(B) = 2

And, using the Classical Definition of Probability, the probabilities of these events can be calculated by,

P(A) = \frac{3}{4}

P(B) = \frac{2}{4} = \frac{1}{2}

Axiom 3 tells us how to compute A \cup B; it tells us the probability of the union is equal to the sum of the individual probabilities. However, if we try to apply Axiom 3 here, we wind up with a contradiction,

P(A) + P(B) = \frac{3}{4} + \frac{2}{4} = \frac{5}{4} \geq 1

Here is a probability greater than 1, which cannot be the case. What is going on?

The issue is the condition that must be met to apply Axiom 3; the events A and B must be mutually exclusive, A \cap B = \varnothing, while in this example we have,

A \cap B = \{ ht, th \}

In other words, A and B are not mutually exclusive here. Therefore, we cannot say the probability of the union of these two events is equal to the sum of the probabilities of each individual event. In fact, in this example,

A \cup B = \{ hh, ht, th \}

And therefore, by the Classical Definition of Probability,

P(A \cup B) = \frac{3}{4}

Which is clearly not greater than 1.

If, instead, we consider the event C,

C \equiv \text{ getting exactly two heads }

Then, the outcomes of C are,

C = \{ hh \}

n(C) = 1

And the probability of the event C,

P(C) = \frac{1}{4}

Then, the compound event B \cup C is found by aggregating the outcomes in both of the individual events B and C into a single new set,

B \cup C = \{ hh, th, ht \}

n(B \cup C) = 3

So the probability of the compound event B \cup C is calculated as,

P(B \cup C) = \frac{3}{4}

Notice B \cap C = \varnothing, i.e. B and C are mutually exclusive, so by Axiom 3, we may also decompose this probability into its individual probabilities,

P(B \cup C) = P(B) + P(C) = \frac{1}{2} + \frac{1}{4} = \frac{3}{4}

In this case, the two methods of finding the probabilities agree because the condition (or hypothesis) of Axiom 3 was met, namely, that the events are mutually exclusive. If the condition (or hypothesis) of Axiom 3 is not met, then its conclusion does not follow.

Theorems#

We can use these axioms, along with the theorems of set theory <set_theorems> to prove various things about probability.

Law of Complements#

Symbolic Expression

P(A) + P(A^c) = 1

This corollary should be intuitively obvious, considering the Venn Diagramm of complementary sets,

../../_images/sets_complement.jpg

If the entire rectangle encompassing set A in the above diagram is identified as the sample space S, then the theorem follows immediately from Axiom 2, namely, P(S)=1.

Warning

The converse of this theorem is not true, i.e. if two events A and B have probabilities that sum to 1, this does not imply they are complements of one another.

To see an example of what that pesky warning is talking about, consider flipping a fair, two-sided coin twice. Let A be the event of getting a head in the first flip. Let B of getting exactly one head in both flips.

The outcomes of A are given by,

A = \{ hh, ht \}

While the outcomes of B are given by,

B = \{ ht, th \}

In this case,

P(A) + P(B) = 1

But A and B are not complements. To restart this result in plain English,

The sum of the probability of complementary events is equal to 1; The converse does not hold, namely if the sum of probability of events is equal to 1, the events in question are not necessarily complements.

Two equivalent formal proofs of this theorem are given below. Choose whichever one makes more sense to you.

Proof #1

By the Classical Definition of Probability, the probability of A \cup A^c is given by,

P(A \cup A^c) = \frac{n(A \cup A^c)}{n(S)}

By Counting Theorem 1,

n(A \cup A^c) = n(A) + n(A^c)

So, the probability of A \cup A^c is,

P(A \cup A^c) = \frac{n(A) + n(A^c)}{n(S)}

Distributing \frac{1}{n(S)},

P(A \cup A^c) = \frac{n(A)}{n(S)} + \frac{n(A^c)}{n(S)}

Applying the Classical Definition of Probability to both terms on the right hand side of the equation,

= P(A) + P(A^c)

On the other hand, by Complement Theorem 2,

P(A \cup A^c) = P(S)

By Axiom 2,

P(S) = 1

Putting it altogether,

1 = P(A) + P(A^C)

Proof #2

By Complement Theorem 3,

A \cap A^c = \varnothing

Therefore, A and A^c are mutually exclusive. So by Axiom 3, we can say,

P(A \cup A^c) = P(A) + P(A^c)

But, by Complement Theorem TWo,

A \cup A^c = S

And by Axiom 2,

P(S) = 1

So,

1 = P(A) + P(A^c)

Example

Find the probability of atleast getting at least one head if you flip a coin 3 three times.

TODO

Law of Unions#

Symbolic Expression

P(A \cup B) = P(A) + P(B) - P(A \cap B)

Again, from inspection of a Venn Diagram of overlappying sets, this theorem should be obvious,

../../_images/sets_union_overlapping.jpg

The union is the area encompassed by bother circles. When we add the probability of A (area of circle A) to the probability of B (area of circle B), we double-count the area A \cap B, so to correct the overcount, we must subtract once by the offending area.

The formal proof Law of Unions follows directly from Counting Theorem 1 and the Classical Definition of Probability. The proof is given below.

Proof

By the Classical Definition of Probability,

P(A \cup B) = \frac{n(A \cup B)}{n(S)}

By Counting Teorem 1,

P(A \cup B) = \frac{n(A) + n(B) - n(A \cap B)}/{n(S)}

Distributing \frac{1}{n(S)},

P(A \cup B) = \frac{n(A)}{n(S)} + \frac{n(B)}{n(S)} - \frac{n(A \cap B)}{n(S)}

Applying the Classical Definition of Probability to all three terms on the right side of the equation,

P(A \cup B) = P(A) + P(B) - P(A \cap B)

Example

Consider a standard deck of 52 playing cards. Find the probability of selecting a Jack or diamond.

The sample space for a selecting a single card from a deck of 52 cards is shown below,

../../_images/playing_cards.jpg

Let J be the event of selecting a jack. Let D be the event of selecting a diamond. This example wants us to find J \cup D.

There are 4 Jacks and 13 Diamonds in a standard deck of cards. Therefore, the probability of the individual events is given by,

P(J) = \frac{4}{52} = \frac{1}{13}

P(D) = \frac{13}{52} = \frac{1}{4}

If we stopped at this point and simply added these two probability to find P(J \cup D), we would be counting the Jack of Diamonds twice, once when we found the probability of a Jack and again when we found the probability of a Diamond. To avoid double-counting this card, we first find,

P(J \cap D) = \frac{1}{52}

Therefore, the desired probability is,

P(J \cup D) = P(J) + P(D) - P(D \cap J)

= \frac{4}{52} + \frac{13}{52} - \frac{1}{52} = \frac{16}{52} = \frac{4}{13} \approx 0.31

Probability Tables#

If you have two events, A and B, then you can form a two-way probability table by partitioning the sample space into A and A^c and then simultaneously partitioning the sample space into B and B^c,

Events

A

A^c

Probability

B

P(B \cap A)

B \cap A^c

P(B)

B^c

P(B \cap A

B \cap A^c

P(B^c)

Probabilitiy

P(A)

P(A^c)

P(S)=1