Reading The Art of Probability: May 2011

Tuesday, May 31, 2011

Exercise 2.2-3

What is the probability of cutting two cards of the same rank from a well-shuffled deck?

Section 2.4 - The Binomial Distribution—Bernoulli Trials

A Bernoulli trial produces an outcome which satisfies a criterion with probability \(p\), and so fails to satisfy the criterion with compliment probability \(q = 1 - p\). What is the probability that \(n\) successive, independent Bernoulli trials results in exactly \(k\) satisfactory outcomes?

The compound outcome is an arrangement of \(k\) satisfactory outcomes and \(n - k\) unsatisfactory outcomes. Because the Bernoulli-trial outcomes are independent, the outcome probabilities are multiplied to get the combined-outcome probability, and because multiplication is commutative, any individual compound outcome has probability \(p^kq^{n-k}\). There are \(C(n, k)\) such compound outcomes, and the probability that \(n\) successive, independent Bernoulli trials results in exactly \(0 \leq k \leq n\) satisfactory outcomes when satisfactory outcomes have probability \(p\) is \[ b(k; n, p) = C(n, k)p^kq^{n - k}\] The compound-outcome sample space is not uniform. Even if \(p = 1/2\),^? \(C(n, k)\) varies with \(k\); the variation becomes more complex when \(p\neq q\).

Sunday, May 29, 2011

Section 2.3 - Combinations

Given \(n\) distinct items, there are \(P(n, k)\) permutations of \(0 \leq k \leq n\) items. Each permutation of \(k\) items can be re-arranged \(k!\) ways. If item order doesn’t matter, there are
\[ \frac{P(n, k)}{k!} = \frac{n!}{k!(n-k)!} = C(n, k) \]
orderless permutations, or combinations. Sanity check: \(C(n, n) = n!/(n!(n - n)!) = 1/0! = 1\).

Combinations are fun to play with. First, \(C(n, k) = C(n, n - k)\).^? Then
\[\begin{eqnarray}
C(n, k + 1) & = & \frac{n!}{(k + 1)!(n - k - 1)!} \\
& = & \frac{n!(n - k)}{(k + 1)k!(n - k)(n - k - 1)!} \\
& = & \frac{n - k}{k + 1}C(n, k)
\end{eqnarray}\]
\(C(n, k + 1)\) is largest when \((n - k)/(k + 1)\) is closest to 1, which occurs when \(k\) is closest to \((n - 1)/2\). Similarly
\[\begin{eqnarray}
C(n + 1, k) & = & \frac{(n + 1)!}{k!(n + 1 - k)!} \\
& = & \frac{(n + 1)n!}{k!(n + 1 - k)(n - k)!} \\
& = & \frac{n + 1}{n + 1 - k}C(n, k)
\end{eqnarray}\]

Saturday, May 28, 2011

Section 2.2 - Permutations

How to count the outcomes in a sample space.

N distinct items sampled with replacement k times produces a sample space of size n^k. Ordered sampling without replacement is a permutation, and produces a sample space of size

P(n, k) = n(n - 1)(n - 2)· · ·(n - k + 1)^? = (n)_k^?(n)_k is known as the falling factorial.

As a sanity check, P(n, 1) is n, and P(n, n + 1) is 0.

Multiplying the falling factorial (n)_k by (n - k)! turns it into the full factorial n! Multiplying P(n, k) by 1 in the form n!/n! gives an alternate formula for permutation counting:

P(n, k) = n!/(n - k)!

For more sanity P(n, n) = n!/(n - n)! = n!/0! = n!^? and P(n, 0) = n!/(n - 0)! = n!/n! = 1.

Section 2.1 - Introduction

We avoid Peirce’s observation^? through careful modeling, consistent methods, developed intuition, result analysis, and reasonableness tests. We must also prepare for non-rational probabilities and non-uniform probability distributions.

Uniform probability assignments allow computations to be either the ratio of satisfactory outcomes over all outcomes, or the sum of individual satisfactory outcome probabilities. Non-uniform probability calculations can only be carried out by summing individual probabilities. Where you end up depends on the direction you start walking. “Random” without modifier is the code-word for uniform probability assignments.

We now discover tools for computing probabilities over finite discrete sample spaces, the related uniform solution methods^?, and the intuition to guide their use, in part by appealing to the frequency interpretation of probability.

Thursday, May 26, 2011

Example 1.9-3 - The Four-Card Deck

A card deck contains four cards: AC, 2C, AD, and 2D. The deck is shuffled, and a pair of cards are dealt.^?

Example 1.9-2 - The Two Children Problem

A family has two children, one of which is a boy. What is the probability the other child is a boy?

Section 1.10 - Critique of the Model

The probability model is based on finite, symmetric sample spaces, which are easy to manipulate and provide simple outcome-probability calculations. However, using rational numbers for ratios can be limiting,^? as can unbounded sample spaces.^?

Separating the probabilistic model from its mathematical underpinnings promotes philosophical clarity, and emphasizes that the mathematics have wider use than just supporting the model. Also missing is the connection between the model based on sample-space symmetry and the intuitive frequency-based probability interpretation; Section 2.8 makes the connection.

Tuesday, May 24, 2011

Section 1.9 - Randomness

Like symmetry and independence, randomness is recognizable by the absence of contrary evidence. Concisely describing a pattern would be suitable contrary evidence. Randomness is best taken as a mathematical concept to avoid philosophical knots and physical antinomies. It’s like quantum mechanics: all outcomes are equally possible until the trial, when all possibilities collapse into the outcome (at which point randomness disappears). The concise pattern criterion seems to prohibit any kind of finitely defined random-number generator.

To some extent, randomness is relative: if you can’t figure out a pattern, then you should assume randomness, even though somebody else might be able to find a pattern. Relative randomness may be unpalatable, but it does allow a pragmatic justification for getting on with the task. Randomness can be interpreted as symmetry (identical probabilities, or uniformity), but symmetry can’t be interpreted as randomness. Sticking to symmetric sample spaces with independence helps sidestep semantic and philosophical problems with randomness. A random sample is actually an average over the sample space.

Section 1.8 - Conditional Probability

Let the trial T_≥2 be flipping 10 coins such that at least one comes up heads. What is the probability that an outcome of T_≥2 has at least two heads?

The requirement that at least one coin land heads-up is a condition,^? and the resulting probability is known as known as a conditional probability. There’s at least three ways to find conditional probabilities: direct counting, indirect counting, and compliment probability.^?

Before getting to the solutions, it’s important to recall a fact, and understand why it’s important. What is T_≥2’s sample space? The sample space is the set of all possible trial outcomes. T_≥2’s sample space is not { t_i | 0 ≤ i < 1024 }^? because t₀ is not a possible outcome for T_≥2.^? Instead, T_≥2’s sample space is { t_i | 0 < i < 1024 }, because now every t_i has at least one heads. The sample space is symmetric, and of size 1023, which means t_i for 0 < i < 1024 has probability 1/1023. These are the two most common errors when dealing with conditional probabilities:^? forgetting to adjust the sample space to accommodate the conditions, and misadjusting the sample space so it contains too many or too few outcomes.

The simplest and most straightforward solution (in some sense) is to directly count all the outcomes satisfying the predicate P(t_i) = t_i has at least two heads and add their probabilities together. There are 1013 such outcomes,^?This result courtesy of the haskell expression length [ i | i ← [1..1023],ones i > 1], where ones n returns the number of one bits in the binary representation of n. and the resulting probabilities sum to 1013/1023 = 0.99.

Direct counting doesn’t scale well with sample-space size or outcome predicate complexity. However, because the predicate partitions the sample space into two subsets, it may be possible to indirectly count the satisfactory outcomes by counting the unsatisfactory outcomes and using the fact that

unsatisfactory size + satisfactory size = sample-space size

to get

satisfactory size = sample-space size - unsatisfactory size

In this case, unsatisfactory size is 10,^? and so satisfactory size = 1023 - 10 = 1013, and the probability is 1013/1023 = 0.99.

The third solution is a variant of the indirect counting because it too relies on outcome predicate partitioning:

Pr(unsatisfactory outcome) + Pr(satisfactory outcome) = Pr(sample-space outcome)

Pr(sample-space outcome) is certainty—that is, 1—and so

Pr(satisfactory outcome) = 1 - Pr(unsatisfactory outcome)

Pr(unsatisfactory outcome) is known as the compliment probability. In this case, Pr(unsatisfactory outcome) = 10/1023 = 0.01, to get Pr(satisfactory outcome) = 1 - 0.01 = 0.99.

Indirect counting and compliment probability are useful to the extent that counting the unsatisfactory outcomes is easier than counting the satisfactory outcomes. If counting the two kinds of outcomes are of similar difficultly, it’s likely that none of the three solution techniques is going to have a clear advantage over the others.

Section 1.7 - Subsets of Sample Spaces

Given a predicate over a sample space,^? what is the probability a trial will produce an outcome satisfying the predicate? That outcome probabilities add is an important fact to exploit when answering this question. Because outcome probabilities in a symmetric sample space are identical, the probability of a satisfactory outcome occurring is the sum of the probabilities of the satisfactory outcomes, or the ratio of the number of satisfactory outcomes overthe sample-space size. Handling complex predicates over large sample spaces is difficult and requires more math; just go for the main ideas now.

Probabilities are scaled to be in the range [0 (impossibility) .. 1 (certainty)], but other ranges are possible, such as [1 (certainty) .. \(\infty\) (impossibility)) or [0 (certainty) .. ∞ (impossibility)).^? Venn diagrams do not scale with the number of sample-space coordinates, and will not be considered further.

Saturday, May 21, 2011

Section 1.6 - Independence

Given two sample spaces X and Y, the Cartesian product X × Y is the sample space { (x_i, y_j) | x_i ∈ X ∧ y_j ∈ Y }. Sample space products may be extended to any number of constituent sample spaces.

Outcomes are independent if prior outcomes have no influence over future outcomes. Independence is a tricky concept, and is unlikely to be realistic.

The Cartesian product of symmetric, independent sample spaces^? is a symmetric sample space. The probability of an outcome from a Cartesian-product sample space is the product of the probabilities of the constituent outcomes, or you can assume symmetry and assign the resulting probability.

Understand what things are, but then understand why they are that way.

Friday, May 20, 2011

Section 1.5 - Symmetry as the Measure of Probability

Given the trial X with sample space {x_i}, the task now is to assign probabilities p_i to the outcomes x_i.

By definition, the probability that trial X produces an outcome in the sample space is certainty, conventionally represented by the value 1. If outcomes are symmetric—that is, are fundamentally indistinguishable^?—then the associated probabilities should also be symmetric—that is, be indistinguishable—which means p_i is 1/n, where n is the sample-space size.^? Indistinguishable outcome probabilities is a consequence of matching indistinguishable outcomes.

Symmetry is a judgment. Perceived symmetry among sample-space outcomes determines the outcome’s probabilities, otherwise you’re stuck, at least for the moment. Symmetry is determined largely by the absence of contrary evidence; if a sample space is not clearly asymmetric, assume it’s symmetric.

The two axioms used to assign probabilities are 1) a trial X produces an outcome in the sample space X with certainty, represented by probability 1, and 2) a trial X produces an outcome x_i from a symmetric sample space X of size n with probability 1/n.

Because probabilities are additive,^? the probability of a combination of symmetric outcomes is the sum of the probabilities of the individual outcomes.

Using intuition to battle circularity, consider equally likely outcomes to occur “at random” (intuition will apparently be replaced with more careful reasoning in Section 1.9).

Thursday, May 19, 2011

Section 1.4 - The Single-Element Model

A trial X produces an outcome drawn from a finite set of possible outcomes { x_i }, 0 ≤ i ≤ n.^? The set of possible outcomes of X, { x_i }, is known as the sample space, sometimes also called X. A single trial outcome can be generalized to a sequence containing finitely, countably, or non-countably many outcomes representing a similarly-sized sequence of trials.

Given a trial X, the first task is to find the sample space. Do not confuse the map and the territory: abstraction is at play when defining the sample space for X.^?

Wednesday, May 18, 2011

Section 1.3 - The Frequency Approach Rejected

Common probability models include single-event probabilities and probability as the ratio of the number of occurrences of a particular outcome over the large and possibly infinite number of possible outcomes (the frequency model). The odds a to b is a third probability model, equivalent to the probabilities a/(a + b) and b/(a + b)

Science has a strong bias towards the measurable. Mathematics has beautiful and interesting theorems with hypotheses that are incompatible with potential applications. There is a tendency to push for practical results using constructivist or computational methods. New applications of probability should be carefully considered to insure appropriate use.

The frequency model runs into mathematical difficulty in both definition and execution; in particular, it is difficult to deal with large or infinite possibilities, and difficult to characterize the resulting probability values. Thus the frequency model is rejected as a hypothesis, and will be derived as a thesis from other models.

Tuesday, May 17, 2011

Section 1.2 - Models in General

We think about the world using models we construct in our heads. Models are neither true nor false, just more or less consistent with our observations of the world. In addition to consistency, models may be more or less complex (Occam’s Razor), have structural criteria (dichotomy, or Levi-Strauss’ groups of three), and be subject to other aesthetic criteria (beauty or crispness).

Models abstract the world, leaving out details considered irrelevant. Probabilistic thinking will be presented as a sequence of models, each model less abstract (more detailed) and more consistent than its predecessor. Successor models remove inconsistencies (paradoxes) from predecessor models. Pay attention to the sequence, and how the sequence grows, so you can develop your own models as needed.

Monday, May 16, 2011

Section 1.1 - Introduction

Few circumstances are completely uncertain; there's usually some underlying certainty involved. Probabilistic thinking recognizes and exploits the underlying certainty. Normal life uses probabilistic thinking for prediction; scientific life also uses it in other ways, such as for modeling, as in errors in measurements, or fundamental explanations, as in quantum mechanics a la Copenhagen.

Several probabilistic thinking styles exist, and several are needed. For example, classical probabilistic thinking excludes independence, an important idea for quantum probabilistic thinking. In addition, other areas - such as mathematics and computation - have thinking styles related to, but different from, probabilistic thinking.

Probabilistic thinking can be subjective (belief based) or scientific (objective). Scientific objectivity recognizes and accommodates the judgments underlying probabilistic thinking; subjective thinking leaves these judgments unrecognized. Studying probability as an end is largely unhelpful; probabilistic thinking must be grounded in the uncertainty of the world to be helpful. Probabilistic thinking is difficult, but less formal styles are easier to learn and use.

But before we can quantify uncertainty, we need some tools.

Sunday, May 15, 2011

1		Probability
2		Some Mathematical Tools
3		Methods for Solving Problems
4		Countably Infinite Sample Spaces
5		Continuous Sample Spaces
6		Uniform Probability Assignments
7		Maximum Entropy
8		Models of Probability
9		Some Limit Theorems
10		An Essay on Simulation

Saturday, May 14, 2011

Preface

What is probability?

Fifth, I also found that there is a large assortment of personal probabilists, the most prominent being the Bayesians, at least in volume of noise. Just what the various kinds of Bayesians are proclaiming is not always clear to me, and at least one said that while nothing new and testable is produced still it is “the proper way to think about probability”. (p. vi)

Friday, May 13, 2011

The Object of Our Veneration

This blog is a diary of a trip through The Art of Probability by Richard Hamming, published in 1991 by Addison-Wesley.

Reading
The Art of Probability