The fundamental theorem of combinatorics is also the most obvious. If there are six possible outcomes from the throw of a die, and two outcomes from the toss of a coin, Then the two events taken together have twelve possible outcomes. This is really a statement about the cardinality of the cross product of two finite sets, and its proof is best left in set theory, where it belongs.

There is a catch. The individual events must be independent. If someone decides to drop the coin flat on the table, heads up, whenever the die comes up 6, there are only 11 possible outcomes, not 12. This disclaimer is obvious, but in the real world, seemingly unrelated events are often connected in subtle ways.

The branch of mathematics known as combinatorics combines independent events and counts, or at least estimates, the number of outcomes. Probability brings in the concept of randomness, so that one of these outcomes can be chosen at random. Then we can compute the odds of a royal flush, 4 chances in 2598960. That will come later. For now, let's start counting events.

A permutation is a specific linear arrangement of items, such as books on a shelf. The number of n element permutations is the product of the integers from 1 to n, which is called n factorial, and is written n!. Select the first (leftmost) book in n different ways, then the next book in n-1 ways, and so on until there is but 1 way to choose the last book. The number of outcomes is n × (n-1) × … × 2 × 1 = n!.

Note that the n events are numerically independent, not physically independent. After selecting David Copperfield, we cannot select it again. Yet after selecting any of the n books, there are n-1 books to choose from, then n-2 books, and so on. We only need numerical independence to multiply outcomes together.

For convenience, define 0! = 1.

5 books arranged on a shelf

When k out of n elements are indistinguishable, e.g. k copies of the same book, the number of different permutations is n!/k!. Temporarily distinguish the k elements, e.g. number the copies of David Copperfield, so that there are again n! permutations. The order of the k copies doesn't really matter, so k! permutations map onto 1. Thus we obtain n!/k!.

A similar factor must be included for each group of repeated elements. For example, The number of permutations of the letters "JJJKLMMN" is 8!/3!/2! = 3360.

In a truncated permutation, only the first k elements of an n-element permutation are significant. As before, there are n elements to choose from, then n-1, then n-2, down to n-k+1. This gives n!/(n-k)!.

There is no simple formula for a truncated permutation with repeated elements, though one can count them out systematically. Consider arranging 4 letters from ABCDEEE. If you select at most one E, that's a truncated permutation, 4 out of 5, giving 120. With three E's on the shelf, there is one other letter, A B C or D, and it is placed in one of 4 positions, hence 16. Finally 2 E's on the shelf calls for 2 other letters, which can be done in 6 ways. Then we are permuting 4 elements with 2 repeated E's, so 6 times 4!/2! = 72. The sum is 208, and there just isn't a nice formula for that.

How many ways can n people sit at a round table? These are circular permutations.

A designated person, person number 1, sits somewhere; call this the head of the table. Let the permutation run clockwise from there. There are n-1 people to choose from, then n-2, then n-3, and so on down to the last. This is (n-1)!.

[ people around a table, numbered 1 at the top then 5 3 6 4 2 around, or some such ]

If order is irrelevant when selecting k out of n objects, these sets are called combinations.

A box may have 13 different books in it, and you're selecting 4 to take to school. How many 4 book sets are possible?

Start by calling this a truncated permutation. You're going to put the four books on the shelf, left to right, and there are 13!/9! ways to do that. As you gather up the four books and put them in your backpack, the order doesn't matter, so the 4! arrangements of the 4 books all map onto the same set. The number of combinations is 13!/(9!×4!), or 715.

Choose 4 books from the 13 in the box

In general, a set of n elements produces n!/(k!×(n-k)!) k-element subsets. This is a common expression, called "n choose k", and written (n:k) for convenience. (Note, (n:k) is my own ascii notation; the standard notation is two dimensional, and shown on the right.) Standard notation for n choose k

Although most applications partition a set into two subsets, e.g. the backpack and the box, the formula can be generalized to handle many subsets. Place four books in the backpack, three in a desk drawer, and two on the mantle, leaving 4 in the box, and the number of book distributions is 13!/(4!×3!×2!×4!) = 900900.

Leaving a combination's size unspecified seems, at first, to complicate the problem. Select an arbitrary number of books from a box containing 13 books, and stuff them into a backpack, and the number of possibilities is (13:0) + (13:1) + … + (13:13) = 8192. The size of the subset is considered first, then the possible combinations for each size. But there is an easier way. Each book either goes into the backpack or remains in the box. Two choices for each of 13 independent events yields 213 = 8192 possibilities. Of course you can have more than two containers. The 13 books can be placed into k different containers in k13 ways.

When the elements are indistinguishable (say 13 marbles), they can be placed in k containers in (13+k-1:13) ways. To see this, picture each marble distribution as a row of marbles with k-1 container dividers inserted into the row. The marbles between divider i-1 and i are located in the ith container. Set k = 2, for instance, and there are (14:13) = 14 possible outcomes. Any of 0 to 13 marbles can be placed in the first box, leaving the rest for the second box.

Place 13 marbles in 5 containers

Pascal is best known for his triangle of numbers, with 1 at the apex. The jth row contains j numbers, centered about a vertical line that passes through the apex. Thus each number has two numbers below it, one slightly to the left and one slightly to the right. The second row, just below the apex, has two 1's just to the left and right of the midline. The third row contains 1,2,1, with 2 on the midline. Each number in the triangle is the sum of the two numbers from above. Boundary numbers are always 1. Here are the first 8 rows of Pascal's triangle.

The first 8 rows of pascal's triangle

When expanded, the coefficients on the terms of (x+y)n form the nth row of Pascal's triangle. In other words, the coefficient on xjyn-j is the jth number in the nth row of the triangle. (Counting starts at zero, not one.) Example: (x+y)3 = x3 + 3x2y + 3xy2 + y3. This relationship can be proved by induction on n.

Start with (x+y)n-1, and expand by the binomial theorem, giving:

a0xn-1 + a1xn-2y + a2xn-3y2 + … + an-2xyn-2 + an-1yn-1

Where the coefficients ai come from row n-1 of Pascal's triangle. Then multiply by x+y. Each coefficient in the product, other than xn and yn, is the sum of two coefficients from the previous product. For example, x3y6 in (x+y)9 comes from x2y6×x and x3y5×y. Hence we add the coefficients on x2y6 and x3y5 from (x+y)8 to get the coefficient on x3y6 in (x+y)9. The coefficients on the terms xn and yn are always 1. This mirrors the construction of Pascal's triangle perfectly.

At the same time, there is another formula for the jth coefficient in the product. By the distributive law, there are 2n terms, selecting x or y from each factor x+y. The number of terms with j instances of x and n-j instances of y is the number of ways we can choose j factors out of n (these j factors presenting x rather than y), which is n choose j, or (n:j), or n!/(j!×(n-j)!). This means the jth number in the nth row of Pascal's triangle is also (n:j). Since these numbers are the coefficients on the nth power of a binomial, they are often called binomial coefficients.

Set x=y=1 and obtain the immediate corollary: 2n is the sum of (n:i) as i runs from 0 to n. The sum of all the binomial coefficients from 0 to n, or the sum of the entries in the nth row of Pascal's triangle, is 2n. When x and y are 1 and -1, the alternating sum of binomial coefficients is 0, except for the top row.

Each term in the product results from choosing x or y in each factor

Remember that the binomial theorem fails if multiplication does not commute. In the quaternions, (i+j)2 is not i2+2ij+j2. It is in fact i2+ij+ji+j2, which equals -2.

The binomial theorem generalizes to the multinomial theorem when the original expression has more than two variables, although there isn't a triangle of numbers to help us picture it. When x+y+z is raised to the n, there are n!/(i!×j!×k!) ways to select i instances of x, j instances of y, and k instances of z. Hence this combination becomes the coefficient on xiyjzk.

Let's look at (x+y+z)7. The coefficient on xy2z4 is the number of ways you can arrange xyyzzzz. That's 7! divided by (1!×2!×4!), or 105.

What is the sum of k×(n:k), i.e. k times (n choose k), as k runs from 0 to n? These types of questions don't come up very often, but when they do, there's a trick.

Remember that (1+x)n is the sum of (n:k) × xk, by the binomial theorem. Differentiate this with respect to x, then substitute x = 1. The right side is the sum of k×(n:k), which is exactly what we want. The left side becomes n×2n-1.

If you are interested in the sum of k2×(n:k), take the second derivative and replace x with 1. This isn't quite right, because you get the sum of k×(k-1)×(n:k), but you can always ad n×2n-1 back in. This cancels the -k in k×(k-1). The answer is (n2+n)×2n-2. Note that similar formulas can be derived for the sum of k3×(n:k), and so on.

This is another connection between discrete and continuous mathematics, from complex differentiation to combinatorics. It's not for the faint of heart. It assumes you have a good understanding of complex variables and analytic functions. You can skip this theorem if you like; it is rarely used.

Just to give you a hint of where we are headed, we want a binomial expansion, similar to the binomial theorem, for the square root of x+y, that is, (x+y)½.

Let c be a constant, and let f be the differentiable function (c+x)t, where t is a positive real number. What is the Taylor series for f?

Evaluate f at 0 and get ct.

The first derivative is t×(c+x)t-1, hence f′(0) = tct-1.

The second derivative evaluated at 0 is t(t-1)ct-2, and so on.

Divide these derivatives by k! to get the taylor coefficients. This gives a factor of the form t×(t-1)×(t-2)×…(t-k+1) over k!. Use the binomial coefficient (t:k) to represent this expression. When t is a positive integer n, the result is the traditional binomial coefficient (n:k). We are simply generalizing the notation to any real number t.

Now we are ready to write the power series for f.

f = ∑{k=0,∞} (t:k)ct-kxk

When t is a positive integer, such as n, the nth derivative of (c+x)n is a constant for all x, and all higher derivatives are 0. The power series truncates to a finite polynomial, as one would expect when f itself is a polynomial. In fact, the power series reproduces the traditional binomial theorem.

To evaluate convergence, write f as exp(t×log(c+x)). This is analytic at 0, hence f agrees with its power series on a circle of convergence. Since log(c+x) converges out to |x| = c, and exp() converges everywhere, f converges at least as far as c. If x is even slightly larger than c, say c+ε, move far out in the power series and consider the ratio of adjacent terms, (t-k)/(k+1) times (c+ε)/c. The first fraction approaches 1 for large k, and the second fraction is a fixed number greater than 1. By the ratio test, this series does not converge, hence the radius of convergence is c.

Note, When t = n, f is a polynomial that converges across the entire plane.

Since f is analytic it extends uniquely to complex numbers, i.e. (c+z)t. Even the exponent t could be complex as well. This is the continuous binomial theorem in its full generality.

Apply this to sqrt(c+x). Rewrite this as (c+x)½, then build the following power series.

f = ∑(k=0,∞) (½:k)c½-kxk

Is there a convenient formula for ½ choose k?

Set the denominator of k! aside for the moment, and consider the product of 1/2 times -1/2 times -3/2 times -5/2 etc. Multiply by 2k and find 1×-1×-3×-5… out to -2k+3.

Setting the signs aside for the moment, the product becomes (2m)! over (m!×2m), where m = k-1.

Divide by 2k (since we multiplied by 2k earlier), and obtain the following.

(2m)! over m!×2m×2k

2×(2m)! over m!×4k

2k×(2m)! over k!×4k

2k×(2k-1)×2m)!/(2k-1) over k!×4k

(2k)! over (2k-1)×k!×4k

Bring the denominator of k! back in, and get this.

(½:k) = (2k)! over (2k-1)×k!×k!×4k

(½:k) = (2k:k) over (2k-1)×4k

This isn't quite right when k = 0, as it produces -1 instead of 1. But remember, we set the signs aside. Let's bring them back in now. When k is 1 the sign is positive, and it alternates from there, so multiply by (-1)k+1, and it all works out. Verify the above formula, with alternating sign, for k = 0 1 2 3 and 4.

With this in hand, the power series for sqrt(c+x) is as follows.

sqrt(c+x) = ∑(k=0,∞) (-1)k+1 × (2k:k) / ((2k-1)×4k) × c½-kxk

If you like, pull sqrt(c) out of the sum, and replace c-kxk with (x/c)k.

sqrt(c+x) = -sqrt(c) × ∑(k=0,∞) (2k:k) / ((2k-1)×4k) × (-x/c)k

Assuming p is prime, how many factors of p are in n!?

There are n/p integers, up to n, that are divisible by p. Here n/p is an integer operation, n divided by p with the remainder thrown away. To this we must add n/p2, since this many integers are divisible by p2. Continue all the way up to pt. However, there is another way to write this, more compact than a sum of quotients.

Write n in base p. In other words:

n = a0 + a1p + a2p2 … atpt.

The kth quotient from the previous sum is n/pk, with remainder discarded, which can be written in base p as follows.

ak + ak+1p + ak+2p2 … atpt-k.

Add these rewritten quotients together, and regroup terms according to the coefficients, giving:

a1 + a2(1+p) + a3(1+p+p2) … at (1+p+…+pt-1).

Multiply this by p-1, giving:

a1(p-1) + a2(p2-1) + a3(p3-1) + … + at(pt-1).

Throw in a0-a0. Group a0 with the sum of akpk to rebuild n. Group -a0 with the sum of -ak to build a number that we will call digits(n), the sum of the digits of n in base p. Therefore the number of factors of p in n! is (n-digits(n)) ÷ (p-1).

Note that p-1 goes in evenly; there is no need to round or truncate.

For instance, there are (17-(2+2+1))/(3-1) = 6 powers of 3 in 17!. These come in via 3, 6, 9, 12, and 15.

There are 15 factors of 2 in 17!

To find the powers of p in the binomial coefficient (n:k), the integers n, k, and n-k cancel, leaving (digits(k) + digits(n-k) - digits(n)) / (p-1).

If n is a power of p, digits(n) is 1. Digits(k) + digits(n-k) will contribute more than 1, unless k is 0 or n. Other than the boundaries, (n:k) contains factors of p.

When n = p, each binomial coefficient has precisely one factor of p, (digits(k) + digits(p-k) - 1) / (p-1), except for k = 0 or p. Of course the special case of n = p has a simpler proof, insofar as p! has exactly one factor of p, and k! and (p-k)! have none, unless k = 0 or p.

Binary coefficients for 7 choose k: 1 7 21 35 35 21 7 1

By the binomial theorem , (x+y)p has all its terms divisible by p, except for xp and yp. If x is not divisible by p, and y has k factors of p, then (x+y)p begins with xp, which is not divisible by p, then pxp-1y, which has k+1 factors of p, then a lot of terms with at least k+2 factors of p. Thus the minimum power of p, ignoring xp, has increased by one. This is called ratcheting the exponent.

Note that this breaks down when y has one factor of 2, as in (x+2w)2 = x2+4xw+4w2. The binomial coefficients stop before the higher powers of 2 kick in, and the second and third terms combine to give a multiple of 8, rather than a multiple of 4. Ratcheting works properly when p is odd, or when the second term has more than one factor of 2.

Use the theory of combinations and independent events to count hands in poker. You can also use these formulas to compute the odds of winning the lottery, but it's really depressing!

There are 52 choose 5 or 2598960 possible hands. With this as a common denominator, we only need find the number of hands in each category.

A straight flush consists of 5 cards in sequence, all of the same suit. There are 4 suits, and the sequence can start with anything from ace to 10 inclusive. That's 4×10 = 40 possibilities.

A straight is merely an ascending sequence, with the suit unconstrained. There are 10×45 = 10240 straights, but we should really subtract the straight flushes, leaving 10200 proper straights.

A flush consists of 5 cards of the same suit. There are 4 suits, and 13 choose 5 combinations in each. Multiply this out and subtract 40, giving 5108 flushes.

If you want four of a kind, pick a rank with all four suits represented, then any other card. That's 13×48 = 624.

A full house requires 3 of one rank and 2 of another. That's 13×4 times 12×6 = 3744.

Three of a kind selects a rank, and 3 from that rank, which is 13×4. Then select any two other cards, 48 choose 2, then take away the full houses. The result is 54912.

For two pair, select two ranks, 13 choose 2, and 2 cards for each, 6×6. Then bring in one of the remaining 44 cards, giving 123552.

Finally we have a pair. Choose a rank and two cards from that rank, 13×6. Then choose 3 ranks from the remaining 12, and any of 4 cards for each. This gives 1098240.

Here is a summary of the hands, starting with the best. Remember the common denominator of 2.6 million.

Straight flush = 40 = 0.0015%
Four of a kind = 624 = 0.024%
Full house = 3744 = 0.14%
Flush = 5108 = 0.19%
Straight = 10200 = 0.39%
Three of a kind = 54912 = 2.1%
Two pair = 123552 = 4.7%
Pair = 1098240 = 42%

How many people must be present before two of them are likely to have the same birthday? Since there are 365 days in a year, you might think you need 100 people or more. Actually 23 people gives you an even chance of finding a duplicate birthday. A few more, and the odds are definitely in your favor. This surprising statistic is called the birthday paradox.

The math is easy if you turn the problem around. What are the odds that every birthday is unique? The first one is free. The second person has a new birthday with probability 364/365. (I'm ignoring leap year.) The third person has a new birthday with probability 363/365, then 362/365, and so on. By the time you get to 343/365, the ratio is 49.27%. So you have a 50 50 shot at finding a duplicate birthday.

Assume there are n choices, and you are looking for duplicates. The odds of having no duplicates after k selections is the product of 1 - i/n, as i runs from 0 to k-1. If r and s are quite small, then 1-r times 1-s is approximately 1-(r+s). Thus 1-0.001 times 1-0.002 is about 1-0.003. Apply this shortcut here. Since i is small relative to n, the product is approximately 1 minus ∑ i/n. This in turn is 1 minus (the sum of the first k integers) divided by n. The sum of 1 through k is (k2+k)/2. This is well known, but will be proved and generalized in chapter 6. Set k to the square root of n, and the sum of the first k integers is half of n. Divide this by n to get ½, whence 1 - ½ = ½. Therefore you have an even chance of finding a duplicate amongst k selections, where k is the square root of n.

This is approximate of course. The true answer is slightly larger, but not much larger. If the pool is 10,000, you must select 119 to expect a duplicate; that's just a bit more than 100. Here is a program that does the math.

This principle has practical applications, such as Pollard rho factorization, described in the next chapter.

A drawer contains n marbles, all of them red, except for one which is green. Pull out a marble and note its color. If it is green, you win! If it is red, toss it aside and try again. How many trials are necessary to win on average?

On the first trial, you pull out a red marble with probability (n-1)/n. Assuming the first marble was red, and discarded, the next trial reveals a red marble with probability (n-2)/(n-1). The game continues through at least k steps with probability (n-1)/n times (n-2)/(n-1) times (n-3)/(n-2) times … (n-k)/(n-k+1). This is a telescoping product, so that the game continues past half the marbles with probability 50%. This is intuitive; you have to pull out and discard about half the marbles before you have an even shot at finding the solitary green marble.

This game is called sampling without replacement, because the red marble is not put back in the drawer. It is tossed aside. Another version of the game, called sampling with replacement, puts the marble back each time. The odds of winning are 1/n for every trial. Now you have to select n marbles, on average, to win. A million marbles, a million trials.

A third variation, somewhat contrived, starts with all red marbles in the drawer. If you pull out a red marble, turn it green and put it back in the drawer. If you pull out a green marble, you win. This reproduces the conditions of the birthday paradox described in the previous section.

We can be a bit more formal about the first two games. The expected value of a game is the sum of each outcome times its probability. This is a weighted sum. In the first game, sampling without replacement, you win at k steps if you have lost through steps k-1, and then you win at step k. This is (n-k+1)/n times 1/(n-k+1), or 1/n. You win at exactly k steps with probability 1/n. Take the sum of k times 1/n and get (n2+n)/2 times 1/n, or (n+1)/2. The expected value of the game, the average time to win, is half of n, which is what we expected.

The second game requires a little calculus. Let x = (n-1)/n, the odds of losing at each step. You win at step k with probability xk-1 times 1/n. Multiply this by k and take the infinite sum. Factor out 1/n, and note that kxk-1 is the derivative of xk. With some help from real analysis, take the sum of xk first, then take the derivative of that. This is a geometric sum, equal to 1/(1-x). Its derivative is 1/(1-x)2. Substitute in for x and get n2. But remember, we factored out 1/n at the outset. Therefore the expected value is n. If you put the marble back each time, you have to pull out n marbles to find the one green marble in the batch. Again, this is a lot of math to prove something farily intuitive.

For those of you who are too young to remember Let's Make A Deal, the host, Monty Hall, stands in front of three closed doors. Behind these doors lie two fabulous prizes and a gag gift, such as a goat eating a copy of War and Peace. He asks you, the contestant, to select a door, then he opens another door revealing one of the two fabulous prizes. He then asks you if you want to change your mind and opt for the remaining door, which you did not originally select. Should you switch?

The answer is clear if there are 500 doors. You pick one, Monty shows you 498 fabulous prizes, and there is one last closed door to consider. You certainly didn't select the one "bad" door at the outset. If you switch now, you're sure to bring home a goat. Stick with your original choice!

The math is not as extreme with three doors, but the answer is the same. Let's say you switch. You selected a good door at the start with probability 2/3, and now you blew it! Only 1/3 of the time, when you selected the bad door, do you become a winner. Now - let's say you stand pat. You're a winner 2/3 of the time, and a loser 1/3 of the time.

The situation is reversed if there is only one fabulous prize. After Monty reveals a worthless pile of empty boxes, you should switch. Again, this is clear with 500 doors, but still true with 3.

Consider a collection of subsets of a universal set U, that intersect arbitrarily. The process known as inclusion exclusion provides a mechanism for counting the total number of elements covered by these sets.

First consider each set independently, and compute the total number of elements therefrom. This is an upper bound, since elements appearing in multiple sets will be counted more than once. Next subtract the elements appearing in pairwise intersections. Then add the elements appearing in three way set intersections. Continue alternating until the set intersection criterion reaches n.

To prove that every element is counted exactly once, consider an arbitrary element x that belongs to exactly r sets. To start the algorithm, add all the elements in all the sets, disregarding overlap. Thus x is counted r times. Next, x appears in (r:2) (r choose 2) pairs of sets, so subtract (r:2) from r. Next, x appears in (r:3) (r choose 3) set triples, So add (r:3) to the total. Continue alternating until (r:r) is reached. Of course r choose r, the last term, is 1. Our element x is only counted once at step r, since there is precisely one collection of r sets that contains x. The element cannot be a member of more than r sets by definition.

Add -1 to the beginning of this series, -1+r-(r:2)+(r:3)-(r:4)+…, and apply the binomial theorem in reverse to get -(1-1)r. This is zero, so the original series added up to 1, and x was counted exactly once.

Inclusion exclusion also works for non-discrete applications, such as determining the area consumed by many intersecting regions. Partition the regions into squares of size ε, and invoke the theorem. Then take the limit as ε approaches 0. In this illustration, the numbers indicate the overlap count. picture of intersecting circles

The number of n-element permutations that never map an element to itself can be determined using the principle of inclusion exclusion. Let U be the universal set of all permutations, which has size n!. Let T be the set of permutations with at least one match. We are interested in U-T.

T is covered by the set of permutations that fix 1, the set of permutations that fix 2, the set of permutations that fix 3, and so on up to the permutations that fix n. Add up the sizes of these n sets, then subtract the pairwise intersections, add the three way intersections, and so on. An example of a 3 way intersection fixes 123, and leaves the rest unconstrained. This is a set of size (n-3)!. And there are (n:3) such sets. Multiply these together to get n!/3!, or in general, n!/i!. T is the alternating sum of n!/i!, starting with i = 1. Subtract this from U and find the alternating sum of n!/i!, starting with 0. Factoring out n! leaves the beginning of the taylor expansion for e-1. Thus the number of no-match permutations approaches n!/e for large n. When n is 10 or greater, the error is less than 1 part per million.

The marriage problem asks how many ways n couples can sit at a round table, such that sexes alternate, and nobody sits by his/her mate. Although recursive solutions exist, deriving a simple formula is notoriously difficult.

Sometimes the alternating restriction is dropped in order to simplify the problem. In this case we can compute the number of permutations using inclusion exclusion. U is all the circular permutations of 2n people, or (2n-1)!. T is the set with at least one couple together.

To seat at least i couples together, select i couples out of n, order them male-female or female-male, and place them at the table along with the other 2n-2i people. This yields a formula:

(n:i) × 2i × (2n-i-1)!

Take the alternating sum of this expression, as i runs from 1 to n, to find T. Subtract this from U to get the no-couple permutations.

When n = 3, U = 120, T = 144 - 72 + 16 = 88, and U-T = 32. Do this manually and find 18 with BBBGGG (B is boy and G is girl), 6 with BBGBGG, 6 with BBGGBG (by symmetry), and 2 with BGBGBG.

The catalan numbers have the formula cat(n) = (2n:n) / (n+1). Starting with n = 0, the sequence looks like this.

1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, …

This is the number of strings of balanced pairs of parentheses. When n = 3 there are 5 possibilities: ((())), (()()), (())(), ()(()), and ()()(). I offer this without proof for now, but the proof will come later, using generating functions and the continuous binomial theorem.

Catalan numbers appear again and again. Here are a few examples.

A binary tree has at most two edges descending from each node, designated left and right. Even a single edge must be designated left or right. For every node, including the root, print a left parenthesis, travel down the left edge and traverse that tree, print a right parenthesis, and travel down the right edge and traverse that tree. The tree can be reconstructed from the string of parentheses, so the correspondence is 1 for 1. Thus cat(n) is the number of binary trees with n nodes.

[ binary tree, root at the top, two branches left and right, a few more branches going down from these. ]

An ordered tree can have many edges descending from each node, but they are numbered left to right. Catalan numbers count the ordered trees with n edges. Traverse the tree recursively, traveling through each subtree in turn, and visiting the branches in order from left to right. Print a left parenthesis as you move down an edge, and print a right parenthesis when you come back up an edge. As a result, the string associated with a subtree will be enclosed in the set of parentheses on the edge just above that subtree. The edges incident to the root correspond to the outermost parentheses in the string. Conversely, the outermost parentheses establish the edges, in order, below the root. Look at the substring between the first set of parentheses. It defines the subtree below the first edge of the root. Do this for the entire string and get the original tree back again.

Consider a polygon that is fixed in the plane. How many ways can you triangulate this polygon?

Let n be 2 less than the number of sides, so that a triangle has n = 1. Of course a triangle is already triangulated, so this can be done in one way.

A square has n = 2, and there are 2 ways to triangulate; draw the diagonal or the other diagonal.

A pentagon has n = 3, and there are 5 ways to triangulate; draw two lines from any of 5 vertices to the two points on the other side. So far the number of triangulations equals cat(n).

Build a correspondence between triangulations and strings of parentheses, starting with a triangle, which corresponds to (). For a larger polygon, number the points clockwise, with 1 at the top. The point just to the left of 1 is n+2. If no edges are incident to 1, then a segment joins n+2 and 2. Print a left parenthesis, convert the smaller polygon, with 2 as its apex, into parentheses, then print a right parenthesis.

[ Octagon with vertex at the top numbered 1, then 2 through 8 around clockwise, edge from 8 to 2, edges from 2 to 6 and 7, edges from 6 to 3 and 4 ]

Assume j is the first vertex that is connected to 1. If j = 3 then we have our triangle. Print a set of parentheses, then encode the rest of the polygon with 1 as the apex.

If j > 3, j and 2 must be connected. Print a left parenthesis, encode the polygon with 2 as apex, print a right parenthesis, and encode the polygon with 1 as apex.

This process can be reversed, giving the original triangulation. Thus cat(n) counts the triangulations of a labeled polygon with n+2 edges.

The arithmetic mean of a set of numbers is their average. Add up all the numbers and divide by n. The average, or mean, or arithmetic mean, of 1, 2, 7, and 28 is (1+2+7+28) / 4, or 9.5.

If all values are positive, the geometric mean is computed by taking logs, finding the arithmetic mean, and taking the exponential. If there are just a few values, the same thing can be accomplished by multiplying them together and taking the nth root. In the arithmetic mean, you add up and divide by n; in the geometric mean, you multiply up and take the nth root. The geometric mean of 21, 24, and 147 is 42.

The geometric mean is used when the log of a measurement is a better indicator (for whatever reason) than the measurement itself. If we wanted to find, for example, the average strength of a solar flare, we might use a geometric mean, because the strength can vary by orders of magnitude. Of course, scientists usually develop logarithmic scales for these phenomenon - such as the ricter scale, decibels, stellar magnitude, and so on. When logs are already implicit in the measurements, we can return to the arithmetic mean.

In every case, the arithmetic mean equals or exceeds the geometric mean. The average of 2, 5, 8, and 9 is 6, yet the geometric mean is 5.18.

Let f be a differentiable function that maps the reals, or an unbroken segment of the reals, into the reals. Let f′ be everywhere positive, and let f′′ be everywhere negative. With a positive derivative, f is invertible. Let g be the inverse of f.

Let s be a finite set of real numbers with mean m. Apply f to s, take the average, and apply g. We'll prove the result is less than m, or equal to m if everything in s is equal to m. When f = log(x), the relationship between the geometric mean and the arithmetic mean falls out.

Shift f(x) up or down, so that f(m) = 0. Let v = f′(m). If x is a value in s less than m, and if f were a straight line with slope v, f(x) would be v×(x-m). Actually f(x) has to be smaller, else the mean value theorem implies a first derivative ≤ v, and a second derivative ≥ 0. On the other side, when x is greater than m, similar reasoning shows f(x) is less than v×(x-m). The entire curve lies below a line l, with slope v, passing through (m,0).

Plot the values of s on the line l. Each y coordinate is v times the difference between the x cordinate and m. The average of all these y coordinates is 0. If f was a line, the line l in particular, f(s) would have a mean of 0. But for every x ≠ m, f(x) is smaller. The y coordinates are all smaller. This pulls the mean below 0, and when we apply f inverse, the result lies below m.

If f′′ is everywhere positive then the opposite is true; the mean of the image of s in f pulls back to a value greater than m.