Abstract Algebra and Discrete Mathematics, Number Theory

Euclid's GCD Algorithm and Unique Factorization (UFD)
Infinitely Many Primes
Irrational Roots
Linear Combination Yields the GCD
A Lattice of Solutions
Modular Inverse
The Chinese Remainder Theorem, Integers
φ and σ Functions
Mersenne and Fermat Primes
Möbius Function
ζ(2)
ζ(4), ζ(6), etc
The Odds of being Coprime
Fermat's Little Theorem
RSA Encryption
Wilson's Theorem
Primitive Root
The Units mod m
The Pseudoprime Test and Carmichael Numbers
A Proof of Primality
Quadratic Residues and the Legendre Symbol
Quadratic Reciprocity
The Jacobi Symbol
Pollard Rho Factorization

If two numbers are divisible by g, then so is their sum and difference. Call the first number g×y and the second g×z, and invoke the distributive property. The sum is g×(y+z) and the difference is g×(y-z). Both are divisible by g. This simple fact can be used again and again to quickly home in on g. Let's look at a simple example.

What number g divides (goes into) both 100 and 36? It must also divide 100-36. In fact it divides 100-36-36, which is the remainder of 100÷36. This remainder, 28, is necessarily smaller than 36 - we've made progress. But what divides evenly into 36 and 28? Subtract again to get 8. Then divide 28 by 8, giving a remainder of 4. Finally 8 and 4 produce a remainder of 0, terminating the algorithm. Now 4 divides both 100 and 36, and if any other number, such as 2, also divides 100 and 36, it must divide evenly into 4. After all, 2 is subject to the same procedure, dividing evenly into all the differences and remainders, all the way down to 4.

The above procedure, called Euclid's gcd algorithm, extracts the greatest common divisor g, because any other common divisor f divides evenly into each difference and remainder along the way, until finally f divides g. The result, g, is truly a "greatest common divisor", since it contains every other common divisor f.

If g is 1, the two numbers are "relatively prime", or "coprime". They have no factors in common.

This algorithm is stunningly efficient, and can find the gcd of large numbers, thousands of digits long. It has been generalized to other domains, e.g. the gcd of two polynomials, as described in chapter 1. It is hard to overstate the brilliance, utility, and beauty of this simple idea, developed some 2,300 years ago.

The least common multiple of a and b, written lcm(a,b), is l, if a and b divide l, and every other multiple of a and b is a multiple of l. We'll show that gcd and lcm go hand in hand, one implies the other, but first a few definitions.

A domain has left and right cancellation. That is, b*x = b*y implies x = y when b is nonzero. This is the case for the integers, rationals, and reals.

An equivalent definition says there are no zero divisors. If cancellation fails we have b*(x-y) = 0, making b and x-y zero divisors. Conversely, if we have zero divisors b and x, b*x = b*0, and cancellation fails.

An integral domain is a commutative domain. It looks somewhat like the integers. There are no zero divisors, and multiplication is commutative.

When working in an integral domain, gcd implies lcm, and vice versa. Hang on, there's a lot of algebra here, but also a core of common sense. If you have g, the gcd of a and b, and you're not sure about l, try l = ab/g. And if you have l and are looking for g, try g = ab/l. If a = 6 and b = 10, then g = 2 and l = 30. It works out.

Given a, b, and g, where g is gcd(a,b), let l = ab/g. This is well defined, since g divides either a or b.

Let m be any multiple of a and b, and let h = gcd(l,m). By the definition of gcd, h divides l, so write hk = l.

Now hk = ab/g, or hkg = ab. Since a and b divide l and m, a and b both divide h. Divide through by a and get (h/a)kg = b. Thus kg divides b. By symmetry, kg also divides a, hence kg divides g. This makes k a unit, 1 or -1 in the integers. With hk = l, h and l are associates. I'm abusing the notation a bit, but let's just say h = l. Since h divides m, l divides m, and m is indeed a multiple of l, making l the lcm of a and b.

Now for the converse, wherein every two elements have an lcm. Given a, b, and l, where l = lcm(a,b), let g*l = a*b. In other words, ab is a multiple of a and b, it must be gl for some g.

Since l/b is well defined, g divides a, and by symmetry g divides b. It is a common divisor, but is it the greatest common divisor?

Let f be some other common factor of a and b. Let h = lcm(f,g) = gk.

Rewrite gl = ab as hl = kab.

Since a is a multiple of f and g, it is a multiple of h. Divide through by h and get l = kb*(a/h).

Remember that a is a multiple of f and g, hence a multiple of their lcm, or kg. And since k divides a and a divides l, k divides l. Divide through by k and get l/k = b*(a/h). This means b divides l/k. By symmetry, a also divides l/k. Since l/k is a multiple of a and b, and l is the least common multiple, l/k is a multiple of l, and k is a unit.

The least common multiple of f and g is gk, or g, and f has to divide into its least common multiple, hence f divides g. This makes g the greatest common divisor of a and b.

In an integral domain, the existence of a gcd for any two elements implies an lcm for any two elements, and conversely.

Euclid was pretty sure that a gcd algorithm, any gcd algorithm, implies unique factorization, but it took the genius of Gauss to fill in the details. Yes indeed, every integer is uniquely a product of prime numbers. The word prime comes from latin primus, meaning first; these are the first numbers, the building blocks for all the others. Related words include primal (basic, earliest), premier (best), and even prime as in prime specimen, having the original and perfect form.

If g is the gcd of s and t, what is the gcd of cs and ct? Certainly cg (c×g) divides both cs and ct. If the greatest common divisor is actually larger, let cgk divide cs and ct, where k takes up the slack. Divide through by c, and gk divides both s and t. This is a contradiction, since g is already the gcd of s and t. Thus cg is the gcd of cs and ct. This can be summarized by saying scaling distributes over gcd. Multiply two numbers by c, and that multiplies their gcd by c.

Consider g = gcd(gcd(a,b),c). Clearly g divides a b and c. If f divides a b and c then f divides gcd(a,b), and c, and f divides g. Since gcd is a valid procedure, g exists, and g divides a b and c, and every f dividing a b and c divides g. If g′ was some other gcd of a b and c then g′ and g would divide each other, hence g is unique (assuming g is positive or otherwise represents its associates).

By symmetry, gcd(a,gcd(b,c)) is also a divisor of a b and c, where every other f dividing a b and c divides into g. It's the same thing. This makes gcd associative: gcd(gcd(a,b),c) = gcd(a,gcd(b,c)).

A similar proof shows lcm is associative.

The next step is to compare prime and irreducible integers, and from there it is easy to prove unique factorization. But prime is not what you think it is.

An element is irreducible if it cannot be written as the product of two other nonunits. What does this mean in the integers? 17 is irreducible because it is not the product of two other numbers, unless one of those numbers is ±1. We typically call this a prime number, but this is in fact the definition of an irreducible number, a number that cannot be "reduced", or separated into two pieces.

A number p is prime if it is not 1 or -1, and it divides a or b whenever it divides ab. If p goes into a product evenly, then p goes into one of the operands. By extension, if p divides abc then p divides a b or c, and so on for any finite product. You probably never saw prime defined this way before, perhaps because prime and irreducible are the same thing in the integers, so it almost doesn't matter; but in other worlds prime and irreducible are not the same, so we have to make a distinction. To reprise: a nonunit p is prime if p divides a or b whenever p divides ab, and a nonunit c is irreducible if it cannot be written as the product of two nonunits a*b.

A prime is automatically irreducible. Suppose p is reducible, such that p = ab. Since p divides ab it divides one of them, say a. Cancel p from both sides and find that b times a/p = 1, whence b is a unit after all. This is a contradiction. So prime implies irreducible.

The key here is to show that every irreducible number is prime, so that the two concepts coincide.

Lemma: if p is coprime to a and b then p is coprime to ab. With p coprime to a and b we have the following.

gcd(p, gcd(p, a) × b) = 1

gcd(p, gcd(pb, ab)) = 1 (scaling)

gcd(gcd(p, pb), ab) = 1 (associativity)

gcd(p, ab) = 1 (gcd(p,pb) is p)

Now assume p is irreducible. If p divides a, e.g. pk = a, then gcd(p,a) is not 1. Conversely, if gcd(p,a) is g ≠ 1, we have p = gk and a = gl. Since p is irreducible, p = g, and p divides a. Thus p divides a iff p is not coprime to a.

Assume p does not divide a or b, hence p is coprime to a and b. It is then coprime to ab, and does not divide ab. Turn this around, and if p divides ab then p divides a or b. This makes p prime. Irreducible and prime are the same thing. In an integral domain, the existence of gcd, or lcm if you prefer, causes irreducible and prime to coincide.

We need an integral domain, and irreducible = prime, and one more thing. Every n must factor into finitely many irreducibles. This is easy; if n is irreducible we are done, it not then split n into a×b, and being smaller, each a and b has a finite factorization. Put them together to find a finite factorization for n. The integers form a factorization domain. Numbers can be factored. There exists at least one finite factorization for n. But is the prime factorization unique?

Of all the numbers with multiple factorizations, Let n be a number with the shortest factorization. If n is prime, it cannot be the product of other primes. There is no "other" factorization, and that is a contradiction. Hence n is composite, and the shorter factorization of n has at least two primes. The "other" factorization may contain more, or it could contain two different primes.

Let p be a prime in the first representation, and q a prime in the second. We don't have to worry about p = q, else we would divide through by p and find another n with two different, shorter factorizations. The primes in the two factorizations are separate from one another.

Let n = q×t, where t takes up the slack in the second factorization. If p divides evenly into t, write n/p as the product of q and the primes of t/p, using any of the possible factorizations of t/p. At the same time, take p out of the first factorization and write n/p as a shorter product of primes. Since q does not appear in the first list, we found another number with multiple factorizations, and the first factorization contains fewer primes. We've already picked the smallest, so p does not divide t. It obviously doesn't divide q, q being prime. Since p is prime it can't divide q×t either, but it does - thus a contradiction. Hence there is no number with two different factorizations. Every number splits uniquely into a product of primes. This is the fundamental theorem of arithmetic.

Let's go the other way; assume the integers form a unique factorization domain, also known as a ufd, so that each n is uniquely a product of irreducibles. Suppose p is irreducible, but not prime. Let p divide ab, thus pk = ab. Separate k a and b into irreducibles. Of course p is already irreducible. By uniqueness, p appears somewhere on the right, somewhere in a or b. Thus p divides a or b. Once again irreducible = prime.

Given unique factorization, write a and b as products of primes, and let g consist of the primes common to both. If f divides a its primes are a subset of those in a, and similarly for b. The primes of f are common to both lists, and are included in the primes of g. Thus f divides g, and g = gcd(a,b).

In an integral domain and factorization domain, the following are equivalent.

Every two elements have a gcd.
Every two elements have a lcm.
Every n is uniquely a product of irreducibles.
Every p is irreducible iff it is prime.

In chapter 1 I demonstrated a gcd algorithm for polynomials over R, or Q, or any domain where division is well defined. Let's complete the picture. Multiply two nonzero polynomials together to get a nonzero polynomial. Just look at the lead terms of highest degree. There are no zero divisors, thus an integral domain. Finally, show finite factorization by noting that a polynomial is irreducible, or it is the product of polynomials of lesser degree. That's all we need. Every polynomial splits uniquely into a product of prime polynomials. Example: x⁴ + 5x² + 4 becomes (x²+1) * (x²+4) over the reals. These quadratics are prime and do not split further. However, in the complex numbers the polynomial splits into (x+i) * (x-i) * (x+2i) * (x-2i). The field of coefficients matters.

There are some integral domains that have no primes at all. Every real number, other than 0, is invertible, and a unit. Technically, every irreducible number is prime, since there are no irreducibles. Technically, this is a unique factorization domain with no primes.

Set this aside for now and assume there is at least one prime in a ufd, and that these primes can be ordered in some way, as is the case with the integers. Suppose there are finitely many primes. Multiply them together and add 1, giving n. Now n is not prime - because it is larger than all the primes on our list, which is suppose to be complete. So n is composite. Let p be a prime in the unique factorization of n. Since p is on the list, it divides n-1, as well as n. Hence it divides 1, which is impossible. There are infinitely many prime numbers.

The same proof shows there are infinitely many prime polynomials over the rationals, or the reals. Given a finite list of prime polynomials, multiply them together, multiply by x (incrementing the degree), and add 1. Then apply the previous proof.

If you want a ufd with one and only one prime, try the rationals with odd denominators. The only prime is 2.

In the integers, the gaps between prime numbers can be arbitrarily large. Recall that n!, or n factorial, is the product of the first n integers. Now n!+2 is divisible by 2, n!+3 is divisible by 3, and so on through n!+n, giving n-1 composite numbers in a row.

Primes that are only 2 apart are called twin primes. Examples include 11 and 13, and 71 and 73. We believe, but have not yet proved, that there are infinitely many twin primes.

As mentioned in chapter 1, the square root of 2 was considered irrational, because it is not the ratio of two integers. Now we can prove it.

Suppose the j^th root of n is a rational number a/b, and write a^j = nb^j. Assume a and b are relatively prime. In other words, the fraction a/b is reduced to lowest terms. The right side of the equation is divisible by the primes of b, and by unique factorization, those same primes divide a, which contradicts a and b relatively prime. So b = 1, and the j^th root of n is an integer, namely a. The square root of 9 is 3, all good, but the square root of 10 is irrational, not a clean ratio, not "reasonable".

When we start imagining the square root of -1, well, that's imaginary. And trying to mix real and imaginary numbers; that's pretty complex. The words actually reflect the evolution of mathematics.

Let g be the gcd of s and t. Are there coefficients x and y such that xs + yt = g? There are, and Euclid's algorithm makes it easy to find them.

For convenience, think of s as t₀, and think of t as t₁. The first step in the gcd algorithm computes t₂ as the remainder when t₀ is divided by t₁. We can write t₀ = q₁×t₁ + t₂. Here q₁ is the first quotient. The same procedure is then applied to t₁ and t₂, allowing us to write t₁ = q₂×t₂ + t₃. Continue writing t_i-1 = q_i×t_i + t_i+1, until t_n = g. Turn the last equation around, so that g = t_n-2 - q_n-1×t_n-1. Replace t_n-1 with t_n-3 - q_n-2×t_n-2. Replace t_n-2 with a comparable expression in t_n-4 and t_n-3. Continue backtracking until g is a linear combination of t₀ and t₁, better known as s and t. Here is the procedure applied to 100 and 36. (Recall that we found the gcd of 100 and 36 at the start of this chapter.)

100 = 2×36 + 28
36 = 1×28 + 8
28 = 3×8 + 4

100 = 2×36 + 28
4 = 28 - 3×8
4 = 28 - 3×(36-28)
4 = 4×28 - 3×36
4 = 4×(100-2×36) - 3×36
4 = 4×100 - 11×36

Here is some code to do the calculations. Is there a way to do it without a stack? Hmm - sounds like one of those Google interview questions.

/* find a linear combination of a and b that yields the gcd */
/* assumes a and b are greater than 0 */
linearCombination(a, b)
{
	stack s;

	while(b != 0) {
		q = a/b;
		push(s, q); /* remember the quotient */
		r = a%b; /* remainder */
		a = b; b = r; /* push back */
	}

	print "gcd is "a"\n";

	/* now backtrack; last quotient doesn't mean anything */
	pop(s);
	/* if b was already the gcd */
	if(empty(s)) {
		print "coefficients are 0 and 1\n";
		return;
	}

	a = 1; b = -pop(s);
	while(not empty(s)) {
		a = a - b*pop(s);
		temp = a; a = b; b = temp; /* swap */
	}
	print "coefficients are "a" and "b"\n";
}

This can be generalized to the gcd of several numbers. Let s₁ through s_n be a finite set of nonzero integers. Derive the gcd of this set as follows. Let g₂ = gcd(s₁,s₂). Thereafter, let g_i+1 = gcd(g_i,s_i+1). Finally g_n is the gcd of the entire set. Use the backtracking procedure to write g_n as a linear combination of g_n-1 and s_n. Then write g_n-1 as a linear combination of g_n-2 and s_n-1. Continue until g₂ becomes a linear combination of s₁ and s₂. Now g_n, the final gcd, is a linear combination of the values s₁ through s_n.

For fixed nonzero integers s and t, find all integers x and y such that xs + yt = 0. This is an example of a diophantine equation, where the variables must be integers.

For clarity, write xs = -yt. Clearly x and y are restricted to a line in the plane passing through the origin, with slope -s/t. If g is the gcd of s and t, divide through by g. Since s/g and t/g are now relatively prime, s/g always divides y, and t/g always divides x. In fact x ÷ t/g = -y ÷ s/g. Call this common quotient k. Conversely, for any integer k, we can construct a pair x y that satisfies the equation, namely x = kt/g and y = -ks/g.

The solutions form a regularly spaced series of dots along the slanted line in the plane, one dot for each value of k. The origin corresponds to k = 0. This is a one dimensional lattice.

Find all x and y satisfying xs + yt = n.

If n is 0 we are done, as shown above. If g, the gcd of s and t, does not divide n, there are no solutions. If n = g×c, use Euclid's algorithm to find a solution for xs + yt = g, then multiply x and y by c.

If there are two different solutions to xs + yt = n, subtract them to find a point on the lattice described above. Conversely, any point on this lattice can always be added to a base solution to get another solution to xs + yt = n. The solutions are precisely a shifted version of the aforementioned lattice. The line of points doesn't pass through the origin any more; it is shifted so that the sum xs+yt is always n, rather than 0.

Find integers x, y, and z satisfying xs + yt + zu = 0.

When z = 0, solve xs + yt = 0, as we did before. This gives a series of colinear points in the z=0 plane, passing through the origin. If g, the gcd of s and t, does not divide u, z is forced to take up the slack. Let v be that portion of g that is not covered by u, and set z = v. Now solve xs + yt = -uv, as we did above. This gives a shifted version of the line of points. In other words, as we move up one level, to z = v, the line of points shifts by just the right amount to produce -uv. Advance to z = 2v, and the base solution is doubled. In other words, the line of points is shifted by twice as much, to produce -2uv. This continues forever, for all levels z = kv. The integer solutions form a two dimensional lattice within a tilted plane that cuts through the origin. If we concentrate on the plane itself, the lattice defines a repeating pattern of parallelograms. Each parallelogram is a copy of the base cell.

If the equation is xs + ty + zu = n, use Euclid's algorithm with backtracking to find a particular solution, then shift the previous planar lattice by that much. Now the plane of parallelograms, a two dimensional lattice, has been pushed away from the origin.

As you have probably surmised, this procedure continues through any number of variables and dimensions.

You've probably seen brain-teasers like this one before. Horses cost $10, pigs cost $3, and rabbits are only $0.50. A farmer buys 100 animals for $100, How many of each animal did he buy?

First we need to combine the two equations into one. Let h be the number of horses, p the number of pigs, and r the number of rabbits. Double the prices, just for a moment, so we can write 20h + 6p + r = 200. At the same time, h + p + r = 100. Subtract the second from the first to get 19h + 5p = 100. Since 19 and 5 are coprime, h steps up in increments of 5, and p drops in increments of 19. Run Euclid's algorithm, and the coefficients that create 1 are -1 and 4. A negative horse and 4 pigs makes 1. Multiply this by 100 to get -100 and 400. The lattice has been shifted. Of course we want positive animals, so we need that stretch of the line that crosses the first quadrant. Push h up by 5×20, and p has to drop by 19×20. That's 0 horses, 20 pigs, and 80 rabbits. Move to the next dot on the line, 5 horses, 1 pig, and 94 rabbits. The next dot is outside the first quadrant, with -18 pigs, so there are only two solutions.

To find the inverse of x mod m, solve km + yx = 1. Thus y is the inverse of x. This is another linear combination problem, but we don't care about k, and y only matters mod m, so there is no need for the stack.

Apply the gcd algorithm to m and x, until you finally reach 1. If the gcd is not 1, x is not invertible mod m. We saw this before; the right side must be divisible by gcd(m,x).

Recall our earlier notation, wherein t₀ = m, t₁ = x, t₂ = m%x, and t_i are the subsequent remainders. At each step of the algorithm, maintain two new variables a and b, such that ax = t_i-1 and bx = t_i, mod m. At the start, a = 0 and b = 1. Sure enough, ax = t₀ = 0 (mod m), and bx = t₁ = x. At each step, the values of t shift, hence a is set to b. At the same time, b becomes a-qb, where q is the quotient produced by the gcd algorithm. Of course a and b are reduced mod m at each step. At the end, t_i becomes 1, and ax = 1, and a is the inverse of x mod m. Here is some code.

modularInverse(m, x)
{
	if(x == 0) return -1; /* 0 not invertible */
	s = m, t = x; /* set up for gcd(x,m) */
	a = 0; b = 1; /* 0*x = s and 1*x = t */
	while(t != 0) {
		q = s/t, r = s%t; /* quotient and remainder */
		s = t, t = r;
		temp = (a-b*q) % m;
		a = b, b = temp;
	}
	if(s != 1) return -1; /* not invertible */
	return a;
}

Given a set of values c₁ c₂ … c_n, and a set of mutually coprime moduli m₁ m₂ … m_n, is there an integer x such that x = c_i mod m_i for each i in 1 through n?

Let z be the product of all the moduli. If x is a solution then so is x plus any multiple of z. Conversely, if w is not a multiple of z, say w is not divisible by m₁, then x+w will not equal c₁ mod m₁, and x+w will not be a solution. The solution, if it exists, is well defined mod z.

To show that a solution exists, you simply construct one. Let a_i be the product of all the moduli other than m_i. Verify that a_i and m_i are coprime. Let b_i be the inverse of a_i mod m_i. Finally, let x be the sum of a_ib_ic_i for all i in 1 … n. Reduce x mod m_i for each i, and verify that x satisfies all n equations simultaneously.

In fact this map, reducing x mod m_i for each i, is an isomorphism from the integers mod z onto all the combinations of integers mod m_i. It respects addition and multiplication, because reducing mod m_i respects addition and multiplication for each i. That makes it a homomorphism. And since there is a procedure to reverse the map, going from c₁ c₂ … c_n back to a unique x mod z, it is an isomorphism. There is really no difference between Z/84, and Z/3 cross Z/4 cross Z/7. The integers mod 84 look just like three wheels spinning in parallel: one is [0,1,2], the next one is [0,1,2,3], and the last one is [0,1,2,3,4,5,6]. Given a number from 0 to 83, reduce it mod 3, mod 4, and mod 7 to position the wheels. The somewhat complicated formula given earlier reverses the process.

For any positive integer n, the function div(n) is the number of positive divisors of n, including 1 and n.

The function φ(n) is the number of elements relatively prime to n. This is also called the Euler totient function.

The function σ(n) is the sum of all the divisors of n.

A number is perfect if σ(n) is exactly twice n. The number is deficient if σ(n) is less than 2n, and abundant if σ(n) exceeds 2n. N is "almost perfect" if σ(n) = 2n-1, and "quasi perfect" if σ(n) = 2n+1. If n is a power of 2, such as 16, σ(n) is 1+2+4+8+16 = 31, making n almost perfect. No quasi perfect numbers are known.

Once n is factored, div, φ, and σ are easily calculated. Consider 45 = 3 squared times 5. Each factor has zero one or two instances of the prime 3, and zero or 1 instances of the prime 5. That's 3 possibilities cross 2 possibilities, or 6 distinct factors. (The factors are 1, 3, 9, 5, 15, and 45.) If the factorization of n includes j distinct primes, having exponents e₁ e₂ … e_j, add one to all the exponents and multiply them together to get div(n).

Compute φ(n) by counting the numbers below n that are coprime to n. Verify that c is coprime to n iff it is coprime to all the prime factors of n. Thanks to the chinese remainder theorem, the values of c mod n are determined by the values of c mod the prime factors of n. Thus we can count the values coprime to p^e, for each p dividing n, and multiply these quantities together to obtain the number of values coprime to n. The values coprime to p^e are precisely the numbers not divisible by p, namely p^e-p^e-1. Returning to φ(45), multiply 9-3 by 5-1 to obtain 24.

To compute σ(45), realize that the divisors can be partitioned into those that are not divisible by 3, those that have one factor of 3, and those that have 2 factors of 3. In each case, the powers of 3, if any, are applied to 1 and 5. Thus σ(45) is (1+3+9) × (1+5), or 13×6, or 78. (Verify this by adding 1+3+9+5+15+45.) Since 78 is less than 90, 45 is deficient, as are most numbers.

Like div(n) and φ(n), σ(n) is a product of expressions, one expression for each distinct prime dividing n. In this case the expression associated with p^e is the sum of the powers of p: 1 + p + p² + p³ + … + p^e. For instance, σ(168) = (1+2+4+8) × (1+3) times (1+7) = 480, an abundant number.

Verify that σ(s×t) is σ(s) × σ(t), provided s and t are relatively prime. This is called a multiplicative function. Div(n) and φ(n) are also multiplicative functions.

If x = σ(n), and k is coprime to n, σ(kn) > kx. In other words, bringing in more factors increases σ faster than n. This is because each prime factor p in k contributes at least p+1 to σ. Similarly, increasing the exponent on a prime in n increases σ faster than n. If σ includes the factor 1+p, and we bring in another p, n is multiplied by p, but σ(n) is multiplied by (1+p+p²)/(1+p), which is a ratio larger than p. Once a number is abundant, all multiples of that number are abundant.

A Mersenne prime is a prime that is one less than a power of 2. Examples include 3, 7, and 31.

The exponent on a Mersenne prime must also be prime. To illustrate, consider 2¹⁵-1. Now 15 is not prime, in fact it is 3×5, so replace 2³ with 8, and write 8⁵-1. This is divisible by 8-1, just as xⁿ-1 is divisible by x-1.

If p is a Mersenne prime, say 2^k-1, then consider n = p×2^k-1. Let's compute σ(n). There is one instance of p, so the first component of σ is 1+p. We know this better as 2^k. The second factor of n is 2 raised to the k-1. Hence the second component of σ becomes 1+2+4+8+16+…+2^k-1. This is equal to 2^k-1. We know this better as p. Therefore σ(n) = 2^k×p, which is twice n, hence n is a perfect number. Perfect numbers include 6, 28, and 496, corresponding to the first three Mersenne primes.

Conversely, let n be a perfect even number. Let n contain k-1 powers of 2. Thus σ(n) includes 2^k-1, which I will call j. If σ(n) = 2n, then j, an odd number, also divides n. If j is prime, it contributes 1+j, or 2^k, to σ(n). This makes j a Mersenne prime, and n is a perfect even number, as described above. If j is anything else, a power of a prime, or a product of multiple primes, σ(j) exceeds j+1. With σ(j) > 2^k, n becomes abundant. Any additional factors, besides 2 and j, will also make n abundant. Therefore all perfect even numbers have been characterized. They correspond one for one with the Mersenne primes.

No perfect odd numbers have been found.

A Fermat prime is one greater than a power of 2. Examples include 3, 5, 17, 257, and 65537.

Consider the number 2^hn+1, where n is odd and > 1, and h is itself a power of 2. If x is 2^h, then our fermat prime can be written as xⁿ+1, which is divisible by x+1. To be prime, n must equal 1, whence the exponent is a power of 2. Review the 5 examples above. Each is 2 to a power of 2, plus 1. In fact, these are the only known Fermat primes, and we believe there are no others.

Fermat primes correspond to constructible regular n-gons, as will be demonstrated later. Thus there are darn few constructible n-gons.

If n is a positive integer, the mobius function μ(n), named after August Möbius, is 0 if p² divides n for some prime p, 1 if n is square free and contains an even number of prime factors, and -1 if n is square free and contains an odd number of prime factors. A literal interpretation gives μ(1) = 1. Verify that μ(n) is a multiplicative function. Since 9 and 17 are coprime, μ(153) = μ(9) × μ(17).

Given n > 1, consider the sum of μ(d) for all divisors d dividing n. Split n into primes, and count the squarefree factors having precisely k distinct primes. If there are t distinct primes in the representation of n, then there are t choose k such factors. This is multiplied by μ, which is (-1)^k. To cover all the factors, take the sum of t choose k times (-1)^k as k runs from 0 to t. Use the binomial theorem to rewrite this sum as (1-1)^t, or 0. The sum of μ over all the divisors of n is 0.

Oddly enough, the sum is still 0, even if we consider the divisors d that are divisible by a base divisor b. If a square divides b then the sum is 0, so assume b has l distinct primes. Let k run from 0 to t-l, as we consider divisors that have k distinct primes beyond b. The l primes in b contribute 1 or -1; this is a constant. The additional k factors introduce (-1)^k. Use the binomial theorem again to get (1-1)^t-l, which is still 0. This breaks down if b = n. That forces t-l = 0, and the binomial theorem doesn't work when the exponent is 0.

If you haven't seen double integration before, you may want to skip this section. It stands alone, and is not used elsewhere in this book. However, if you have had calculus, you should read on. This is another lovely connection between continuous and discrete math; using an integral to answer a question about when two numbers are relatively prime.

If s is a positive integer, the Riemann zeta function ζ(s) is the sum of 1/n^s as n runs from 1 to infinity. If s = 1, ζ(s) is the harmonic series, which does not converge. For larger s, the series is dominated by the integral of x^-s, as x runs from 1 to infinity, hence the series converges. In fact s can be any real number > 1, and ζ(s) is well defined.

A further generalization allows s to be any complex number, whence ζ(s) becomes a function on the complex plane. This is central to analytic number theory, but for now we're only interested in ζ(2), as a lemma for the next theorem.

Let q = ζ(2), the sum of the reciprocal squares. Consider only the even terms of the series, 1/4 + 1/16 + 1/36 + …, and the result is ¼q. Thus the odd terms sum to ¾q. Compute the sum of the odd terms, then multiply by 4/3 to get q.

Let the series r_n = 1/(2n+1)², as n runs from 0 to infinity. As mentioned above, r contains the odd terms of ζ(2): 1 + 1/9 + 1/25 + 1/49 …

Consider the function f(x) = x²ⁿ. Integrate this from 0 to 1 and obtain 1/(2n+1).

Then consider f(x,y) = x²ⁿy²ⁿ = (xy)²ⁿ. Integrate this over the unit square, as x runs from 0 to 1 and y runs from 0 to 1, and obtain 1/(2n+1)², or r_n. The n^th term of the series r has been expressed as a double integral over the unit square.

The series r is the sum of integrals; rewrite this as the integral of the sum. The justification for this interchange is part of real analysis, and is beyond the scope of this book.

∫ ∑ (xy)²ⁿ

For a fixed point x,y in the unit square, the terms of the sum are (xy)²ⁿ, which is a geometric series. Therefore the sum is 1 over 1-(xy)². Integrate this over the unit square.

Integrate by trig substitution as follows.

x ← sin(u) / cos(v)
y ← sin(v) / cos(u)

The denominator, 1 - (xy)², becomes 1 - sin²(u)sin²(v) / cos²(u)cos²(v). This looks like a mess, especially since it sits in the denominator, but let's persevere and evaluate the jacobian.

cos(u) / cos(v)	sin(u)sin(v) / cos(v)²
sin(u)sin(v) / cos(u)²,	cos(v) / cos(u)

Compute the determinant of this matrix. The product of the main diagonal is 1, and the product of the antidiagonal is sin²(u)sin²(v) / cos²(u)cos²(v). The jacobian is exactly the same as the denominator, 1 - (xy)², hence the reparameterized integrand simplifies to 1. We only need find the area of the corresponding shape in the uv plane.

Consider the triangle bounded by the u and v axes, and by the line u+v ≤ π/2. Start by looking at the border of this triangle.

The bottom of the triangle (v = 0) maps to the bottom of the square via sin(u), though not defined at u = π/2. The left edge of the triangle (u = 0) maps to the left edge of the square via sin(v), though not defined at v = π/2. The entire hypotenuse, u+v = π/2, sans the endpoints, maps to the point x = y = 1.

The interior of the triangle becomes the interior of the square. The diagonal line u = v from the origin to π/4,π/4 maps to the diagonal of the square via tan(u),tan(u). Now traveling in the other direction, draw a line parallel to the hypotenuse inside the triangle, at u + v = t. This maps to a curve starting at 0,sin(t), passing through tan(t/2),tan(t/2) on the diagonal, and over to 0,sin(t). Move the line towards the hypotenuse, and the curve sweeps up and to the right. Near the hypotenuse, the resulting curve hugs the right edge and the top of the square.

Aside from two border discontinuities at the corners, the triangle maps continuously onto the square. The area of the uv triangle is π²/8. This is equal to the double integral of 1 over 1-(xy)² on the unit square.

Multiply this value by 4/3, and ζ(2) = π²/6.

Although it isn't needed for the next theorem, let's compute ζ(k) for k any positive even number. In other words, what is the sum of 1/n⁴, 1/n⁶, 1/n⁸, etc. This is more general than the proof given in the previous section, which was specific to ζ(2), but it requires knowledge of fourier transforms. If you are not familiar with this technique you can skip this, and continue with the odds of being relatively prime, which only requires ζ(2).

Let f(x) = x^k from -π to π. Thereafter f is periodic, with period 2π. It looks like a long train of U's. Since f attains the same value, namely π^k, at the start and end of its period, f is continuous across the entire x axis. (This is where we need k to be even.) A continuous function agrees with its fourier series, so on we go.

Take the fourier transform of this function, using sines and cosines. Since f is even, the sine terms drop out, leaving only the cosines. From here we can, if we wish, integrate from 0 to π and then double the result.

The first fourier coefficient is the integral of 1 times f(x), divided by 2π, which yields π^k/(k+1).

By parts, the integral of x^kcos(nx) is:

x^ksin(nx)/n - k/n × ∫ x^k-1sin(nx)

The first term is 0 when evaluated at π or -π, leaving only the second integral. By parts again gives:

k/n × { x^k-1cos(nx)/n - (k-1)/n × ∫ x^k-2cos(nx) }

The first term inside the braces becomes 2π^k-1, or the opposite if n is odd. The second is the integral for k-2, the previous even integer.

If i(k) is the integral from -π to π, then we have the following recurrence relation.

i(k) = k/n² × { ±2π^k-1 - (k-1)×i(k-2) }

Again, the first term inside the braces is positive for n even and negative for n odd.

At the base of the relation, when k = 0, i(0) = 0 for every n. This because the integral of x⁰cos(nx) from -π to π is 0.

When k = 2, i(2) = ±4π/n², positive for n even and negative for n odd. Remember that the integral is divided by π to give the fourier coefficient, hence the coefficient on cos(nx) is ±4/n². The fourier expansion looks like this.

x² = π²/3 + 4 × { -cos(x) + cos(2x)/4 - cos(3x)/9 + cos(4x)/16 - …}

Replace x with π, whence cos(nx) becomes 1 or -1, for n even or odd, and get 2π²/3 = 4ζ(2). Divide through by 4, and ζ(2) = π²/6, which confirms the earlier result.

Now let's derive ζ(4); higher values of k can be addressed in a similar fashion. Invoke the recurrence relation with k = 4.

i(4) = 4/n² × { ±2π³ - 3×i(k-2) }

Substitute in for i(2) and get this equation for the definite integral at k = 4.

i(4) = ±{ 8π³/n² - 48π/n⁴ }

Once again this is divided by π to give the fourier coefficients.

x⁴ = π⁴/5
+ 8π² {-cos(x) + cos(2x)/4 - cos(3x)/9 + cos(4x)/16 - …}
- 48 {-cos(x) + cos(2x)/16 - cos(3x)/81 + cos(4x)/256 - …}

Substitute x = π as before. Move π⁴/5 to the left and get 4π⁴/5. The next expression is 8π²ζ(2), or 4π⁴/3. Bring this to the left and get -8π⁴/15. This equals -48ζ(4). Therefore ζ(4) = π⁴/90. This is approximately 1.082.

Here is a quick synopsis of ζ(6).

i(6) = ±{ 12π⁵/n² - 240π³/n⁴ + 1440π/n⁶ }

6π⁶/7 = 12π⁴ζ(2) - 240π²ζ(4) + 1440ζ(6)

ζ(6) = π⁶/945, approximately 1.017.

When k is odd, ζ(k) is notoriously difficult to analyze. Roger Apéry proved, in 1978, that ζ(3) is irrational. So we're still homing in on this one.

Here is a small program to approximate various values of ζ.

Some series look a bit like ζ(2), with a quadratic downstairs; and these may be easier, or harder, to analyze. The sum of 1 over (n-a)*(n-b), is easy, because it telescopes. If a and b differ by d, then d separate subseries collapse down to their lead elements. An example is 1 over n²-1, the bottom being n-1 times n+1. Each term is half of 1/(n-1) - (1/(n+1). Start with n = 2, to avoid the pesky term 1/0, and find:

½ { (1-1/3) + (1/3-1/5) + (1/5-1/7) + (1/7-1/9) + (1/9-1/11) … }
+ ½ { (1/2-1/4) + (1/4-1/6) + (1/6-1/8) + (1/8-1/10) … }
= ½ { 1 + 1/2 }, or 0.75

Other quadratics are harder. I will run through 1 over n²+1, quickly, and then move on. This involves yet more fourier analysis. Let f = e^x from 0 to π, and e^-x from -π to 0. Since f is even we can ignore the sines and focus on cosines. Integrate f times cos(nx) from 0 to π, double it for the full period, and divide by π for the fourier coefficient a_n. Of course the first coefficient a₀ is the average of f, which is (e^π-1)/π. The n^th integral is e^x*(n*sin(nx)+cos(nx)) / (n²+1). Evaluate at 0 and π and get (±e^π-1) / (n²+1), the first term being negative for n odd and positive for n even. Multiply by 2/π for the fourier coefficient a_n.

Evaluate f(π) - f(0), using the fourier series. The even terms drop away, leaving 4/π × (e^π+1) × ∑ 1/(n²+1) for n odd. In the same vein, evaluate f(π) + f(0), giving 2×(e^π-1)/π + 4/π × (e^π-1) × ∑ 1/(n²+1) for n even. After a lot of algebra, and conversion to hyperbolic trig functions, which are more concise in this case, ∑ 1/(n²+1) = ½(π/tanh(π)-1).

Ok, that's enough of that; let's get back to number theory.

What are the odds that two numbers, chosen at random, are relatively prime?

Actually we can't choose a number at random from an infinite set, so let's rephrase the question. Find the probability that two numbers, drawn from 1 to n, are relatively prime, and take the limit as n approaches infinity.

Two numbers are both divisible by p with probability (1/p)². In other words, each number is divisible by p with probability 1/p. This probability may not be exact, due to the remainder of n mod p, but as n grows large the probability approaches 1/p². Taking the complement, two numbers fail to have a common factor p with probability 1 - 1/p².

By the chinese remainder theorem, all primes can be treated as independent events. Thus the probability of being relatively prime is the product of 1 - 1/p², as p runs through all the primes ≤ n. As n increases we keep bringing in more factors of p, and this product decreases monotonically. What is the limit of this product?

Consider the reciprocal of this product. Each factor now looks like p²/(p²-1). Rewrite this factor as the convergent series:

1 + 1/p² + 1/p⁴ + 1/p⁶ + 1/p⁸ + …

This is a geometric series that equals 1 over (1-1/p²), which is p²/(p²-1).

Take the product of these infinite series, one series per prime.

1	+	1/4	+	1/16	+	1/64	…
×
1	+	1/9	+	1/81	+	1/243	…
×
1	+	1/25	+	1/625	+	1/15625	…
×
1	+	1/49	+	1/2401	+	1/117649	…
×

Expand this into a sum of products, just as (a + b) × (x + y) becomes ax + ay + bx + by. Of course this is a bit more complicated, expanding an infinite product of infinite sums. Still, the distributive law holds, and fortunately, we only need consider finite products in our expansion. Infinite products of terms beyond 1 all drop to 0.

Each finite product produces 1/n² for some integer n. For instance, 1/4 × 1/9 × 1 × 1 × 1 … = 1/36. By the fundamental theorem of arithmetic, each inverse square is produced exactly once. The expanded product is now the sum of 1/n², or ζ(2).

As shown in the previous section, ζ(2) = π²/6. Take the reciprocal, and two numbers are coprime with probability 6/π², or about 60%.

This generalizes to more than two numbers. Select k numbers at random, and they will have a greatest common divisor of 1 with probability 1/ζ(k). The proof is just like the above, but uses k^th powers instead of squares.

A random integer n is square free with probability 6/π², or 60%. The math is virtually the same as the coprime problem. n is divisible by p² with probability 1/p², and if n is square free, this test fails for all primes. Manipulate the reciprocal of the product of 1 - 1/p², and get ζ(2) as before.

These proofs need a lot more δs and εs to be rigorous, but that is beyond the scope of this book.

If u is coprime to m, then we can find the inverse of u mod m, as shown earlier. Conversely, if u and m have a gcd g > 1, then all multiples of u mod m will be divisible by g. Thus u is not invertible.

The units mod m are the numbers from 1 to n that are coprime to m. This is a set of size φ(m). Let u act on the units by multiplication. Thus the image of v is uv. This action can be reverse by the action of 1/u. Therefore the action is a permutation. Multiplication by u moves the units about.

Let t be the product of all the units. Replace each unit v with uv, then multiply all the units again. The set of units has not changed, it is just rearranged. The result is t times u^φ(m). For any unit u, u^φ(m) = 1 mod m. There are 12 units mod 15; verify that 2¹² = 4096 = 1 mod 15.

For a prime p, this becomes Fermat's little theorem. Any number u from 1 to p-1 is coprime to p, and a unit. Raise this to the φ(p), which is p-1, and get 1. In other words, u^p-1 = 1. For example, raise 3 to the 16 mod 17 and get 1.

A consequence of Fermat's little theorem is the fact that there are infinitely many primes not equal to 1 mod m, for m > 2. If m is odd, double it - that won't change anything. Assuming a finite list, let z be the product of the primes that are not 1 mod m, and are coprime to m. This includes, at a minimum, m-1, or the prime factors of m-1. Remember that m is even, so z is odd. Let x = z^φ(m)-2. By Fermat's little theorem, x = -1 mod m. Now x is not divisible by any primes in m or z. Either x is prime or it consists of prime factors that are not part of z. These primes cannot all equal 1 mod m, else x = 1 mod m. Thus there are always more primes not equal to 1 mod m. As a corollary, there are infinitely many primes equal to 3 mod 4, and 5 mod 6.

If e is any exponent, raise the units mod m to the e power to find a homomorphism. This because (uv)^e = u^ev^e. If e is invertible mod φ(m), then the homomorphism becomes an isomorphism. Let d be the inverse of e mod φ(m). Raise u to the e, then to the d, and that is the same as raising u to the e×d. Since u^φ(m) = 1, u^e×d = u. We are back to start. Thus u^e and u^d are inverse permutations. These functions are central to RSA encryption, as described in the next section.

If m = 17, raise 1 through 16 to the third, and then to the eleventh, to get back to start. This gives an efficient algorithm for the cube root of x mod 17; the cube root is simply x¹¹.

Alice wants to send Bob a secret message that would be meaningless if discovered or intercepted by a third party. So Alice scrambles the message, and when Bob receives it he unscrambles the message and reads its contents. If Charlie intercepts the message, he does not know how to unscramble it, so Alice's secret is safe.

The process of "scrambling" a message is called encryption, a word that is related to the word crypt. The meaning of the message is buried deep within a metaphorical vault underground. The process of unscrambling is called decryption; the meaning is pulled out of the crypt and into the light of day.

Here is a scheme that might work. Alice asks her friend Mary to translate the message into Navaho, who broadcasts it over the radio to Frank, who translates it back into English for Bob. This is a great scheme, as long as nobody else speaks Navaho. In fact this was the case on the battlefield in World War II, where the United States employed some 400 code talkers, speaking languages that were unrelated to any European or Asian tongues. These codes were never penetrated.

This is good for a start, but Alice wants her voice encrypted and decrypted directly, without having to rely on a bilingual friend. Also, as we move in to the digital age, Alice wants to encrypt text documents and images. We need to encrypt numbers, thousands of them per second, and decrypt them on the fly. A mathematical algorithm is called for.

Alice and Bob meet in a cafe, and Alice hands Bob a string of a million random digits. They part company, and later on Alice uses these digits to encrypt her messages. Bob uses the same digits to decrypt. This scheme is virtually impenetrable, but it requires Alice and Bob to meet ahead of time, in a secure setting, and share the "key", i.e. the string of random digits. The scheme also suffers if Alice has a lot to say. A high resolution image consists of a million bytes, which is encrypted via the million digits. How does she send the next picture? Alice can reuse the string of digits, but if you do this three or four times, the enemy will detect patterns. Pretty soon he will reverse engineer the random digits, and with the key in hand, he can read all of Alice's messages, including her prior messages, which he has stored for later analysis. Ideally, the string of random digits should be used only once, hence it is called a one-time pad.

In the world of e-commerce, Alice and Bob have never met, and they never will meet. They cannot share a one-time pad, or any other "key" in confidence.

Bob runs an online store, and Alice wants to purchase something from his website. She needs to transmit her credit card number in an encrypted format, so that anybody listening in on the conversation will be denied any real information. This is where public key cryptography comes in. When you see the lock icon on your browser, your computer is encrypting your information, so that nobody except the intended party can receive it.

When you visit Bob's website, he provides a public key, which your computer uses to encrypt your messages. Bob can decrypt these transmissions, but only with his private key, which is held secret. The public key and the private key complement each other. They work together to encrypt and decrypt a message.

How do we know Charlie hasn't taken over a router on the internet, and produced a website just like Bob's? He can create his own public and private key, and hand you the public key, and decrypt your information using his private key. The encryption is perfect, and impenetrable, yet Charlie has obtained your credit card number. This problem is known as key management, and it is central to the success of any encryption scheme. However, it is not a problem in mathematics per se. Rather, it is a problem of organization. Often a trusted authority is established, who can confirm that Bob does indeed have this particular public key, and if he says otherwise, it isn't really him; it's an impostor. The internet uses a system similar to this, though I am oversimplifying it just a bit.

Let's assume we are sure this is Bob's website, and this is Bob's public key. How can we encrypt traffic, using the public key, so that only Bob can read it with his private key? There are several methods, but the most popular, and the one employed by the internet, is rsa encryption.

In 1977, Ron Rivest, Adi Shamir, and Len Adleman developed the public key encryption scheme that is now known as rsa, after their initials. (Apparently Clifford Cocks made a similar discovery in 1973, but this was not known at the time.) The method uses modular exponentiation, which can be performed efficiently by a computer, even when the module and exponent are hundreds of digits long. The public key is a modulus m and an exponent e. A message is represented by a number c between 0 and m-1. If the message is longer, chop it up into pieces and encrypt each piece. Raise c to the e power mod m and transmit the result.

The private key is the same modulus m and the inverse exponent d, such that e×d is 1 mod φ(m). As you recall from the previous section, c to the e to the d gives c back again. Raising to the e encrypts, and raising to the d decrypts.

In a typical application, m is the product of two large primes p and q. Thus φ(m) = (p-1) × (q-1). Given an exponent e, find the modular inverse relative to φ(m), and call it d. This builds the public and private keys.

The security lies in the difficulty of factoring m into its primes. Without p and q, Charlie cannot derive φ(m), and cannot compute d. Anybody can send encrypted data to Bob, but only Bob can decrypt.

As factoring algorithms improve, we are forced to use larger values of m. At the dawn of e-commerce, 128 bit encryption was sufficient, but in just a few short years the internet has advanced to 2048 bit keys, to keep ahead of the evolving factoring algorithms running on ever faster computers. Most people believe factoring is an intractable problem that becomes exponentially more difficult to solve as the numbers grow large. If we chose large values of m, such as 5000 bits, we'll be fine for the next billion years. Others believe an efficient factoring algorithm exists; it just hasn't been discovered yet. To guard against this possibility, mathematicians are developing other forms of public key cryptography that have nothing to do with factoring. These may gain widespread acceptance in the future, but for now rsa reigns supreme.

Now let's turn this scheme on its head. Since e and d are complementary, you can encrypt a message with your private key, and anyone else can decrypt it using your public key. This is a form of digital signature. Only you could have sent the message. Only you know the private key.

If Alice encrypts her message with her own private key, then reencrypts with Bob's public key, only Bob can read it, and he knows it came from Alice. This is private, signed communication in the digital age.

If p is prime, (p-1)! mod p is -1.

Group each nonzero element with its inverse. These become 1 and go away. This is fine except for values of x that are their own inverses. In other words, x² = 1. We know there are only two square roots, 1 and -1, so the product is -1.

First a lemma - if we set φ(1) = 1, then the sum of φ(d) for all d dividing n is n.

Start with n = p (prime), with divisors 1 and p. Remember that φ(p) = p-1, and we already said φ(1) = 1, hence the sum is p as it should be.

If n = p² then we have 1 + p-1 + p²-p = p². This continues through p^k.

If n = p^k×t, assume the theorem is true for t, having fewer primes. Apply φ to the divisors of n that don't contain p. These add up to t. Then apply φ to p times all the divisors of t. Since φ is multiplicative, this is (p-1)×t. Then apply φ to the divisors containing p², giving (p²-p)× t. This telescopes up to p^k×t, which is n.

Let p be prime and let n = p-1. Consider the integers mod p. Select any s nonzero and let d be the least exponent with s^d = 1. Suppose sⁱ = s^j for 1 ≤ i < j ≤ d. Divide through by sⁱ, and s^j-i = 1, which is a contradiction. The d powers of s are distinct. The next d powers repeat the first d powers, and so on. Fermat showed that sⁿ = 1, hence d is a divisor of n.

The d elements s¹ through s^d all satisfy x^d-1 = 0, and that's all the roots we are allowed. Nothing else to the d equals 1.

Mark s with d, it's minimal exponent. If d is even, s² is marked with d/2, and s^d/2 is marked with 2. If d is divisible by 3 then s³ is marked with d/3, its minimal exponent, and so on. s^d is of course 1, and is marked with 1.

If v is coprime to d then s^v is marked with d. You have to go all the way up to s^vd to get back to 1. There are therefore φ(d) elements marked with d. Nothing else can be marked with d because we have found all the roots of x^d-1.

This holds for each d dividing n. There are at most φ(d) elements marked with d. Recall that the sum of φ(d) over the divisors d of n, is n. Thus there are at most n elements marked with some exponent. But every s is marked with something. Therefore φ(d) elements are marked with d for every d. Set d = n and there are φ(n) elements marked with n. Pick one and call it b, for the base. Since b is marked with n, it has order n. The powers of b cycle through everything nonzero mod p. If p is 11, 2 is a valid base, and the cycle is:

2 4 8 5 10 9 7 3 6 1

b is sometimes called a base to support the concept of a discrete log, but b is also called a primitive root, because it is the "first" root of xⁿ-1. The powers of b cover everything.

Artin's conjecture says that every positive integer, not a square, is primitive for infinitely many primes.

As demonstrated in the previous section, the units mod p are cyclic. In other words, they can all be generated by successive powers of a primitive root. How about the units mod p^r? There are φ(p^r) of them, which is p^r-1×(p-1), but how do they behave under multiplication?

Let's start with p = 2. This is a special case. Remember that φ(2^r) = 2^r-1.

There is only one unit mod 2, and 3 is a primitive root mod 4, so assume r is at least 3.

Let x = 5, though anything 5 mod 8 would do. Thus x is 1 mod 4, but not 1 mod 8. Think of x as 1 + 4. Square x, and ratchet up the exponent from 2 to 3. In other words, x² is 1 mod 8, but not 1 mod 16. Square again to get something 1 mod 16, but not 1 mod 32, and so on. Let y = x after r-3 steps. Now y is 1 mod 2^r-1, but not 1 mod 2^r. This is a nontrivial square root of 1, something other than ±1. Square again to get 1.

Consider x^e and write e in binary. The least significant bit of e wins. If e is odd, x is multiplied by x² raised to the e/2. The former is 5 mod 8 and the latter is 1 mod 8. The result is 5 mod 8; it cannot be 1. If e is even, but not divisible by 4, Then x^e = x² times x⁴ to the e/4. This is 9 mod 16 times 1 mod 16, which is 9 mod 16. Continue clearing bits; the only way to reach 1 is when e = 2^r-2.

To recap, the order of x is 2^r-2, and that means all the powers of x, up to 2^r-2, are distinct. These are all 1 mod 4, and that neatly covers all the elements that are 1 mod 4.

We need another generator for the units that are 3 mod 4. Let g = 2^r-1-1. This is another nontrivial square root of 1. Multiplying by g permutes units, so g times the powers of x covers the remaining units, everything 3 mod 4.

Since g and x commute, these two generators are independent. One is a 2 cycle and the other starts a cycle of length 2^r-2.

Now let's look at p^r, where p is odd and r > 1. Let g be a primitive root mod p. The order of g is thus a multiple of p-1.

Let h = g^p-1. If we are unlucky, and h is 1 mod p², consider g+p instead. Use the binomial theorem to expand (g+p)^p-1. This brings in p×(p-1)×g^p-2, and higher powers of p. Thus, with g set to g+p, the new value of h is now 1 mod p, but not 1 mod p².

Raise h to the p, and ratchet up the exponent. This must be done r-1 times to reach 1. The order of g is now a multiple of p-1, and a multiple of p^r-1, and these being coprime, the order of g is the size of the entire group. The multiplicative group of units is cyclic, generated by g.

If m is composite, use the chinese remainder theorem to evaluate the group of units. Treat each prime power independently, then combine the cycles to describe multiplication mod m. The units mod 49×16×71 are described by four cycles running in parallel, with cycle lengths of 42, 4, 2, and 70.

The units of a ring, as opposed to the ring itself, are denoted with a * in the exponent. Thus the units mod m are (Z/m)^*.

Given a large composite integer n, it is usually possible to prove n is not prime by selecting a value of x such that x^n-1 is not 1. This is the pseudoprime test. It is efficient, even for large n, thanks to modular exponentiation. The adjective pseudo is prepended because you might be unlucky in your choices, and always get 1, even though n is not prime.

Some composite numbers, Carmichael numbers, pass the pseudoprime test for every value of x coprime to n. The smallest of these is 561; x⁵⁶⁰ = 1 for every x, even though 561 is composite. Here is a characterization of carmichael numbers. This is not used elsewhere, so you can skip it if you like.

Let n be a Carmichael number. By definition, n is not prime, yet x^n-1 mod n is 1 for every x coprime to m.

If p² divides n, then the group of units includes a cycle of length p×(p-1). Use this cycle to find x, a nontrivial p^th root of 1. Since p divides n, xⁿ = 1, and x^n-1 is the inverse of x. Yes x^n-1 is also 1, hence 1/x = 1, and x = 1. This is a contradiction. All Carmichael numbers are square free. (This works even if p = 2; find a square root that is 3 mod 4.)

If p is a factor of n, the group of units mod n has an independent cycle of length p-1. Let x b primitive mod p, and 1 mod every other prime dividing n. This can be done using the chinese remainder theorem. Now x^p-1 = 1, and if x^n-1is also 1, then p-1 divides n-1. This relationship holds for every p dividing n. Conversely, if this relationship holds, and we consider any element x coprime to n, raise x to the p-1 and get 1, at least mod p; so then x to the n-1 is also 1 mod p. But this holds for every p dividing n, hence x^n-1 = 1 mod n. The criterion of p-1 into n-1 is necessary and sufficient.

If n is even, then p-1, for some odd prime, will not divide n-1. Carmichael numbers are odd.

If n = pq, where q is larger, then q-1 divides pq-p, and cannot divide pq-1. Carmichael numbers have at least three prime factors.

Recall the earlier example 3×11×17 = 561, and note that 2, 10, and 16 all divide 560.

It requires only algebra to show that the product of 6k+1, 12k+1, and 18k+1 is carmichael, provided all three factors are prime. The same holds for 60k+43, 180k+127, and 300k+211.

There are an infinite number of carmichael numbers. This nontrivial result was proved in 1994 by Alford, Granville, and Pomerance. In fact there are infinitely many carmichael polynomials of the form a₁k+b₁ × a₂k+b₂ × a₃k+b₃. Another example is 30k+13 × 90k+37 × 150k+61.

Here is an extended criterion for carmichael, a stronger assertion than p-1 into n-1. Let n, a carmichael number, be the product of p times q₁ through q_j. Here p could be any of the prime factors of n. For convenience, let t be the product of q₁ through q_j.

p-1 divides n-1

p-1 divides pt-1

For some c, pt - 1 = cp - c

cp - tp - c = -1

cp - tp + t - c = t - 1

(c-t) × (p-1) = t - 1

p-1 divides t-1

Verify this for 561: 2 divides 186, 10 divides 50, and 16 divides 32. It also holds for the polynomial triples shown above.

Since the algebra can be reversed, this condition is both necessary and sufficient.

If you can factor p-1, you can prove p is prime. Of course you need to prove that the factors of p-1 are also prime, but they're smaller, so perhaps that's easier.

If p is prime it has a primitive root, lots of them. Try a couple values at random; you're sure to run into one. If g is primitive, and q is a prime dividing p-1, then g^(p-1)/q will not be 1 mod p. We haven't gone through the entire cycle. Conversely, assume g^p-1 is 1, but the exponents (p-1)/q don't produce 1. If g^k = 1 then k divides p-1, yet it doesn't divide (p-1)/q for any of the primes q in p-1, so k cannot be a proper factor of p-1. G generates a cycle of length p-1. All the nonzero elements are units; they are all coprime to p. Hence p is prime.

For large primes however, p-1 can rarely be factored.

In the context of a fixed modulus m and a specific exponent n, c is a residue if there is some x such that xⁿ = c mod m. In other words, c has an n^th root. When n is 2, c is a quadratic residue. When n = 3, c is a cubic residue. Quartic and quintic residues are defined similarly.

If n is coprime to φ(m), raising to the n^th power permutes the units. This can be reversed, and every unit is a residue with precisely one root. For instance, every element mod 5 is a cubic residue, because 3 is coprime to 4. The cube root of 2 is 3, and so on. Every unit mod 5×17² is also a cubic residue, because 3 is coprime to 4 and 17 and 16.

Zero is always a residue, and it has precisely one root when m is square free.

The Legendre symbol [a\p] for p prime is 0 if a = 0, 1 if a is a quadratic residue, and -1 if a is not a quadratic residue.

Let r be a primitive root mod p. The powers of r, from 1 to p-1, cover all the nonzero values. The even powers of r are the squares, the quadratic residues. The odd powers of r cannot be squares, for the square of any power of r presents an even exponent. There's an easy way to see if a number is a square. Anything raised to an even power, then to the (p-1)/2, is really raised to the p-1, and becomes 1. Conversely, when r is raised to an odd power, and then to the (p-1)/2, it becomes -1. Therefore [a\p] is the same as a^(p-1)/2.

Using this equivalent formula, [a\p] × [b\p] = [ab\p]. The legendre symbol respects multiplication. In other words, the legendre symbol is a homomorphism from (Z/p)^* onto ±1.

Let's see what this means when a and b are and are not residues. The product of squares is a square (that's pretty obvious), a square times a nonsquare is a nonsquare (that's sort of obvious), and the product of nonsquares is a square (that's far from obvious). For instance, 3 and 5 are not squares mod 7, yet their product, 15, is 1 mod 7, which is certainly a square.

Sometimes it is convenient to map the homomorphism onto 0 and 1, for squares and nonsquares respectively. In this case multiplication upstairs corresponds to addition downstairs. If r is the primitive root, and a = r^j, then[a\p] is just the parity of j, or j mod 2. This generalizes to higher residues. The cubic residue of a might be 0 1 or 2, according to j mod 3, assuming 3 divides p-1. The kernel of this map is the cubes. We may explore this generalization later, but for now let's return to quadratic residues mapping onto ±1.

To see if -1 is a square, raise it to the (p-1)/2 power. The result is 1 only when (p-1)/2 is even. Therefore -1 is a quadratic residue iff p = 1 mod 4.

If -1 is a square mod p, is it also a fourth power? Only if p-1 is divisible by 8, that is, p = 1 mod 8. Similarly, -1 is an 8^th power iff p = 1 mod 16, and so on. We're really taking the discrete log of -1 mod p, which is (p-1)/2, and then counting the powers of 2 therein.

Next consider [2\p]. Write the following series of equations.

1 = (-1) × (-1)¹
2 = 2 × (-1)²
3 = (-3) × (-1)³
4 = 4 × (-1)⁴
…

Let s be half of p-1, and let the equations run from 1 to s. Multiply the s equations together, hence the left side becomes s!. The right side includes a 2 from the second equation, a 4 from the fourth equation, and so on. Once we pass s, the right side still has the even numbers, but they are disguised as negative numbers. Twice s shows up as -1 (first equation), twice (s-1) is -3 (third equation), and so on. So the right side contains s! × 2^s. The s! cancels the s! on the left, and 2^s is the same as [2\p]. Finally, the right side includes (-1)^1+2+…+s. The exponent simplifies to s×(s+1)/2. This is even when s is 3 or 4 mod 4, which means 2 is a quadratic residue iff p = 1 or 7 mod 8. 2 is the square of 6 mod 17, but it isn't the square of anything mod 13.

Let p and q be odd prime numbers. Quadratic reciprocity states that [q\p] = [p\q] iff p and q are not both 3 mod 4. If you know one legendre symbol, you know the other.

There are many proofs of this theorem; Gauss came up with several all by himself. Here is one that doesn't require a great deal of knowledge from other branches of mathematics, though it is still quite technical.

To simplify the notation downstream, let s = (p-1)/2 and t = (q-1)/2. Thus s is half of p and t is half of q, rounded down.

If we knew that the product [q\p] × [p\q] was -1 to the st power, we would be done. That's because st is odd precisely when p and q are 3 mod 4. The two legendre symbols are opposite only when their product is -1, when st is an odd power, when p and q are 3 mod 4. Otherwise the legendre symbols are equal. We need to show that [q\p]×[p\q] is -1 to the st power.

Consider the rectangular region of the xy plane where x ranges from 1/2 to p/2 and y ranges from 1/2 to q/2. This region contains st lattice points, i.e. the points with integer coordinates.

Draw three parallel lines through this rectangular region, each having slope q/p. The first runs through the origin, all the way to the point p/2,q/2, which is the upper right corner of our rectangle. Call this the main line.

Shift the main line up ½ unit to get the upper line. Let w = q/(2p), which is where the main line hits the left side of the rectangle. (You might have to extend the left side downward if q < p.) Shift the main line down by w to get the lower line. Thus the upper line is raised by ½ and the lower line is down by w. A fourth line runs parallel to these three, and will be described later.

Notice that the upper and lower lines cross the axes at ½, consistent with the lower left corner of the rectangle.

Let c be the center of the rectangle. It's coordinates are (p+1)/2 and (q+1)/2. Spin the plane 180° about c, and the rectangle coincides with itself, and lattice points move to lattice points. The lattice point that was at the upper left of the rectangle is now at the lower right, and so on.

The lower line hits the right side of the rectangle at w below the upper right corner, by definition. Meantime the upper line hits the left side of the rectangle at w+½-½ = w above the lower left corner. There is a symmetry here. Rotate the rectangle about its center, and the upper and lower lines switch positions. By correspondence, there are as many lattice points above the upper line as there are below the lower line. Their sum is even, so we can subtract it from st without harm. This will not change the parity of st, or the value (-1)^st. We only need look at the two middle strips, bounded by the main line and the upper and lower lines.

We need to show that the product [q\p] × [p\q] is (-1)ⁿ, where n is the number of lattice points inside the two strips. It would be enough to show that [q\p] is (-1)^k and [p\q] is (-1)^l, where k counts the points in the upper strip and l counts the points in the lower strip. With k+l = n, the product would then be (-1)ⁿ, which is the same as (-1)^st, which is what we want.

There is another symmetry here, this time reflection. Reflect everything through x = y. This causes p and q to change places. The three lines now have slope p/q. They still intersect the axes at 0 and ½. They intersect the sides of the rectangle at p/(2q) from the corners. If the upper strip yields the correct formula here, reflect back to get the lower strip in the original rectangle. Thus it is enough to prove the relationship for the upper strip.

At this point we must draw a fourth line, parallel to the other three. Recall that the upper line was produced by shifting the main line up by ½. Now shift the main line down by ½ to get the fourth line, which I will call the mirror line. For every value of x, the upper and mirror lines have y coordinates that differ by 1, with the main line trapped halfway between. When x is an integer between 1 and s, exactly one lattice point is trapped between the upper and mirror lines. Which points lie above the main line, and which lie below?

None lie precisely on the main line, because p and q are prime, and a lattice point on the main line implies a common factor. Each point is above or below. It depends on the remainder when qx is divided by p. This is indicated by the distance from the main line to the lattice point below. If the remainder is less than half of p, the lattice point is trapped below the main line. If the remainder is more than half of p, the lattice point is trapped above the main line.

Two remainders will never be the same, else q times the difference in x coordinates is divisible by p. Since the difference in x coordinates is less than p, and q is a prime other than p, this is impossible. Similarly, two remainders can't be opposite mod p, else q times the sum of the x coordinates is divisible by p.

Let r(x) be the remainder of qx mod p when this remainder is less than half of p, and p - qx mod p when the remainder is more than half of p. Now r(x) is a scaled measure of the vertical distance between the main line and the trapped lattice point. There are s values of r(x), one for each salient x coordinate, and they're all different, and they're all in the range 1 to s.

Looking mod p, qx is either r(x) or -r(x). It is -r(x) when the lattice point lies above. There are k of these by definition. Multiply these equations together and the product over qx equals the product over r(x) times (-1)^k.

The product of r(x) as x runs from 1 to s is s!. The product over qx is q^s×s!. Remember that q^s is the same as [q,p]. Cancel s! from both sides, and [q\p] = (-1)^k; and that completes the proof.

The question is, how did Gauss ever think of this? Got me! I guess the same way Beethoven thought of his 9^th symphony.

Here are a few applications of reciprocity.

Is 3 a square mod p? Flip to find out, and negate if p = 3 mod 4. The answer is yes if p = ±1 mod 12. For instance, 3 = 5 squared mod 11. Similarly, [5\p] = 1 when p = ±1 mod 5, and [7\p] = 1 when p = ±{1,3,9} mod 28.

When factoring 999993, p must divide 1000²-7. Reducing mod p shows 7 is a residue, hence p is ±1, ±3, or ±9 mod 28.

If p is a Fermat prime, 3 is primitive mod p. Recall that p-1 is a power of 2. There are φ(p-1) = (p-1)/2 primitive roots, each a nonsquare. Assuming p > 3, 3 does not divide p or p-1. Thus [3\p] = [p\3] = -1, and 3 is primitive.

If a Fermat candidate n, with exponent 2^h, is not prime, and p divides n, then 2 raised to the 2^h is -1 mod p. Take discrete logs in the multiplicative group mod p, and log(2)×2^h maps to (p-1)/2. This contains all but one of the factors of 2 in p-1. This means log(2)×2^h+1 contains exactly the powers of 2 in p-1. For h > 1, p = 1 mod 8, and 2 is a square. Thus log(2), relative to a primitive root b, is even. Pull a factor of 2 out and log(sqrt(2))×2^h+2 accounts for the factors of 2 in p-1. Therefore p is 1 mod 2^h+2.

I tried this for the first few fermat numbers, including 2¹²⁸+1. The factors are 59649589127497217 and 5704689200685129054721; and each is 1 mod 512.

If n is odd, the jacobi symbol [a\n], which looks exactly like the legendre symbol, is the product [a\q] for all prime factors q in n, including multiplicities. Since n factors uniquely, this is well defined. [a\n] = 0 iff a and n have a common factor q.

Unlike the legendre symbol, the value of the jacobi symbol does not tell you whether a is a square mod n. Intuitively, let n = 77 = 7 times 11. By the chinese remainder theorem, these run as separate rings. Half the coprime elements are squares mod 7, and half are squares mod 11, so one fourth are squares mod 77. Yet the jacobi symbol is 1 half the time. In particular, -1 is not a square mod 7, nor mod 11, yet [-1\77] = 1.

Since multiplication distributes over the legendre symbol, it also distributes over the jacobi symbol. That is, [a\n] × [b\n] = [ab\n]. Use this in reverse to write [a\n] as the product of [p\q] for all primes p in a and all primes q in n, including multiplicities.

If the jacobi symbol is pulled apart, as above, assume there are k primes in a that are 3 mod 4, and l primes in n that are 3 mod 4. Flip the legendre symbols over and follow the quadratic reciprocity law, bringing in kl factors of -1. This is -1 iff kl is odd iff a and n are both 3 mod 4. Therefore quadratic reciprocity is valid for jacobi symbols, using the same formula.

Show that [2,n] and [-1,n] can be evaluated by looking ad n mod 4 and mod 8, as though they were legendre symbols.

At the end of the day, one does not need to factor a or n to determine [a\n]. Perform a procedure similar to Euclid's gcd algorithm, keeping ±1 in an accumulator. Divide the bottom into the top, take the remainder, pull out factors of 2 (adjusting the accumulator), and flip (adjusting the accumulator). When the top becomes 1, you are done.

If n happens to be prime (at the start), then the original jacobi symbol [a\n] is itself a legendre symbol. You have computed the legendre symbol, and you know whether a is a square mod n.

For grins, here is the math to see if a million and 7 is a square mod a billion and 7. (1000000007 is a prime number.)

1000007 mod 1000000007

Both numbers are 3 mod 4, so negate the accumulator as you flip
1000000007 mod 1000007 sign -1

Reduce
993014 mod 1000007

Numbers have to be odd, so divide the top by 2.
The bottom is 7 mod 8, so 2 is a square, and nothing changes.
496507 mod 1000007

1000007 mod 496507 sign 1

6993 mod 496507

496507 mod 6993

4 mod 6993

4 is a square, so we're done. The accumulator holds 1, so 1000007 is a square mod 1000000007, though we have no idea the square root. In fact the square root can be found efficiently; factor the polynomial x²-1000007 mod 1000000007 using an algorithm that will be described in the chapter on finite fields; and the answer is ±227949833.

There is much more on generalized reciprocity, the jacobi symbol, and applications here.

Let n be a 40 digit composite number that you want to factor. You can divide n by all the primes up to the smallest prime p dividing n, but if p is 20 digits long, then you have to run 10²⁰ high precision operations, and that's going to take a while. Pollard Rho factors n in sqrt(p) time, which is about 10¹⁰ operations, and brings numbers like n within reach.

Imagine a function f that maps integers mod n into integers mod n. If a prime p divides n, f maps integers mod p into integers mod p. Invoke f again and again, and you will fall into an infinite loop as soon as a value repeats. Assuming the behavior of f appears random, the birthday paradox says a value will repeat in sqrt(p) steps.

Assume f has a period of length l, approximately equal to sqrt(p). This period starts at step k, so that a_k = a_k+l = a_k+2l and so on. Start with a seed, such as 5, and let one wheel spin twice as fast as the other. In other words, x = f(x), and y = f(f(y)). After m steps, x = a_m, and y = a_2m. The difference is m, and m increases until it equals l. At that point x = y mod p. Run gcd(n, x-y) to extract p.

The other prime q could fall into a period of length l as well. If that happens, x-y = 0 mod n, and gcd doesn't help. This doesn't happen very often, but it does happen, and when it does, you can go back and try a different seed or modify f in some way.

The name "pollard rho" comes from John Pollard, who developed the algorithm, and the Greek letter rho, which looks like a circle with a tail. ρ The first few million values march along the tail and around the circle, until a value repeats, and then you are in a loop.

Here is some code.

/* assumes n is not already prime */
rho(n)
{
	seed = 5, offset = 29;
	limit = 2 * FourthRoot(n);

restart:
	x = y = seed;
	for(i=0; i<limit; ++i) {
		x = (x*x + offset) % n;
		y = (y*y + offset) % n;
		y = (y*y + offset) % n;
		difference = x-y;
		if(difference < 0) difference = difference + n;
		if(difference == 0) break; /* p and q in the same loop */
		g = gcd(n, difference);
		if(g > 1) return g; /* proper factor of n */
	}
	offset += 7;
	goto restart;
}