Abstract Algebra and Discrete Mathematics, Finite Fields

Finite Fields
Primitive Root
-1 and 2 as Squares
Unique Field for each Prime Power
The Automorphisms and Subfields of F
Circular Shift of the Roots
Finding an Irreducible Polynomial
Irreducibles by Degree
Irreducibles by Squares and Shifted Squares
Polynomial Factorization
The Norm Distribution Theorem
Counting Matrices
The Structure of O₂
O Acts on Unit Vectors
Quaternions and Matrices are Isomorphic
Algebraic Closure mod p

Add 1 to itself again and again in a finite field and the numbers must eventually return to 0, giving a characteristic of p. Thus Z/p is at the base of every finite field.

In fact Z/p is itself a field, and the underlying reason for this is unique factorization in the integers. For a nonzero x, multiply the nonzero integers mod p by x and you never get 0. Otherwise xy = kp, and p would divide x or y. Similarly, xa ≠ xb, else x*(a-b) would be 0. Multiplication by x maps a finite set into itself and is injective. The map becomes a permutation, hence there is some y with xy = 1. This makes x invertible, and Z/p is a field.

In fact there is only one field of order p. The identities 0 and 1 are fixed, and 1+1 has to be 2, and 2+1 has to be 3, and so on until the field is set.

Are there larger finite fields on top of Z/p?

Well you can always extend a field. Adjoint the indeterminant x to get (Z/p)(x), the fractions of polynomials in x with coefficients mod p. This is an interesting field, but it's not finite. If you want a finite field, you need a finite, i.e. algebraic, extension of the base field Z/p. That's not a problem; adjoin anything that is missing, such as the square root of 3 mod 7.

A finite field extension implies a finite basis. If the basis has size n, for an n dimensional extension, then the number of vectors is pⁿ, selecting a coefficient mod p for each basis element. Everything in F is uniquely spanned by this basis, hence the order of F is pⁿ.

There is no field of order 35. There is however a field of order 49. In general, you can adjoin a root u of an irreducible polynomial over a base field K, and K[u] becomes a larger field, whose dimension is the degree of the polynomial. Let u satisfy the irreducible polynomial x²-3, mod 7, and the dimension is 2, with basis 1 and u. The size is 49. Typical elements are 3+u, 6+5u, 4, 3u, 1+2u, etc.

Math in a finite field is done in the usual way; multiply polynomials together, with coefficients mod p, then collapse the result as directed by the irreducible polynomial.

Let F have order pⁿ, hence there are m = pⁿ-1 nonzero elements. Each of these is invertible, hence a unit. They cannot be zero divisors. Thus multiplication by x permutes the nonzero elements of F. From here the proof of Fermat's little theorem applies, almost word for word. Let t be the product of all the units. Replace each v with x*v and multiply them all again. The result is t*x^m. Thus every nonzero x, raised to the pⁿ-1, is 1 in F.

The proof of the primitive root is once again the same. Replace p-1 with m = pⁿ-1, and let d loop over all the divisors of m. There is some element with order m. Actually there are φ(m) elements with order m. Select any one of these, call it b, and b is a primitive root of F. As such, b can be used as a base for discrete logs.

Can we tell at a glance whether -1 and 2 are squares in F, similar to the integers? The answer is yes, and the criteria are the same.

If -1 is y², then y⁴ = 1. The discrete log of y, times 4, is pⁿ-1, hence pⁿ-1 is a multiple of 4. This can be reversed to find y, the square root of -1. Therefore -1 is a square in F iff the order of F is 1 mod 4. This happens iff p is 1 mod 4, or p = 3 mod 4 and the dimension of F is even.

In characteristic 2, -1 = 1, and 1 is the square root of 1, so technically -1 is a square.

If p = 1 or 7 mod 8 then 2 is a square in Z/p, and it remains a square in F.

Assume p = 3 or 5 mod 8, and the dimension of F is odd. If 2 has a square root in F then adjoin it and find an extension of dimension 2. This cannot live in a larger extension of odd dimension; therefore 2 is not a square.

Finally let p = 3 or 5 mod 8 and let the dimension of F be even. Let m = pⁿ-1, so that the logs run from 0 to m-1. Since p to an even power is 1 mod 8, m is divisible by 8. Let y have a log of m/(p-1). Raise y to the p-1 and get 1, and all the powers of y are distinct, hence they are the roots of x^p-1-1 = 0, and they are the integers. In fact y is primitive for the integers mod p. Since p-1 is 2 or 4 mod 8, the log of y is even. The log of every integer is even, and every integer, including 2, has a square root.

If w is the order of F, the rules are the same as they were mod p: -1 is a square iff w = 1 mod 4, and 2 is a square iff w = 1 or 7 mod 8.

There is but one field, up to isomorphism, for each prime p. This was described at the start of this chapter. In the same way, there is but one field for each prime power. Adjoin the square root of 3 to the integers mod 7, or the square root of 5. There is an isomorphism between these two fields of order 49; they are essentially the same.

Let F have order pⁿ, with m = pⁿ-1 units. As shown above, the units are the m^th roots of 1. Therefore F is the splitting field for x^m-1 over the integers mod p. This is the unique characterization of F.

If you want to be technical, factor x^m-1 into irreducibles, and select the largest irreducible, ordered in any way you wish. Adjoin u, the root of this polynomial. This defines the next finite field. If it is not all of F then factor x^m-1 over (Z/p)(u), giving x-u along with other irreducibles. Once again select the largest irreducible, and let v be a root. Adjoin v, which builds the next field extension. Continue along this path until you reach F. Any field of order pⁿ has to have this structure. Given two such fields, equate 1 with 1, and u with u, and v with v, and so on to build the isomorphism.

This proves the field is unique, but does it exist?

Let r(x) = x^m-1, and use formal derivatives to look for any repeated roots. The derivative is mx^m-1, which has a root of 0. This cannot be a root of r(x), hence all the roots of r(x) are distinct.

Adjoin these roots to Z/p, and call the resulting field F. With 0 included, the order of F is at least m+1 = pⁿ. This includes the integers, since any integer to the p-1 is 1, and p-1 divides m.

Let c be the frobenius homomorphism c(x) = x^p. Every field homomorphism is injective, and since the field is finite, with dimension no larger then m!, c is also surjective, hence a field automorphism.

Let d = cⁿ. Thus d is an automorphism, the result of applying c n times. Let L be the subring of F that is fixed by d. If d fixes x, and y is x inverse, then write xy = 1 and apply d, whence d(y) has to equal y. The subring L is a field.

By construction, x lies in L iff x^m+1 = x. This is 0 and the m roots of 1, i.e. 0 and the roots of r(x) Therefore L is a finite field of order pⁿ. In fact F, built from the roots of r(x), = L.

If a field L is finite, of order p^r, there is exactly one n dimensional extension up to isomorphism, namely the finite field of order p^rn.

You might be wondering if this works for an extension of dimension p. Well there is nothing in the above proof that precludes it. Of course you can't adjoin a p^th root of an integer c, since x^p-c is not irreducible; in fact it is (x-c)^p. But there is some irreducible polynomial of degree p.

Consider p = 2, and extend the integers mod 2 up to a field of order 4. The recipe says to adjoin the cube roots of 1. In other words, adjoin x³-1, or an irreducible factor thereof. With -1 = 1, we have the following factorization.

x³ + 1 = (x² + x + 1) * (x + 1)

The first factor is irreducible of degree 2, and generates the finite field of order 4.

Let F be a finite field of order pⁿ, and let c be the frobenius homomorphism c(x) = x^p. Since the domain is a field, c is a monomorphism, and since c maps a finite set into itself, c becomes an automorphism.

Repeat c again and again, denoted c^k, to get other automorphisms, until cⁿ raises each x to the pⁿ, which is just x, whence cⁿ is the identity map. After you apply c n times, you are back to start.

Here is another way to view it. Let b be a primitive root of F, so that the powers of b run through all the units of F. Take discrete logs, and the log of b is 1, and the powers of b step through the integers 1 through pⁿ-1, which is back to 0, that is, b⁰ = 1. Given k < n, b is raised to the p^k, which lands on something other than b. You don't get back to a trivial automorphism until cⁿ. Thus there are n distinct automorphisms, forming a cycle of length n.

c, c², c³, c⁴, … cⁿ = 1

Are there any other automorphisms? Let an automorphism map b to b^j, whence this same automorphism has to raise every power of b to the j. In other words, it multiplies the log by j. If j is divisible by p then it includes an instance of c, the frobenius automorphism. Apply c first if you like, then raise everything to the j/p. Therefore we can assume j is not divisible by p.

Since j is only meaningful mod m, where m = pⁿ-1, we can assume j is strictly between 1 and m. For every x, (x+1)^j = x^j + 1^j. This becomes a polynomial of degree ≤ m-2, with lead term jx^j-1. There can be at most m-2 solutions, yet we require m+1 solutions. This is a contradiction, hence all the automorphisms are generated by c as described above.

The subfields are determined by the automorphisms. Let k be a factor of n, and let L be the subfield fixed by c^k. Thus x is in L iff x raised to the p^k gives x back again. This means x = 0 or x is a p^k-1 root of 1. Look at these roots of 1 by taking discrete logs base b. Let r = (pⁿ-1)/(p^k-1). If y = b^r, then log(y) = r. Any multiple of r, times p^k-1, is a multiple of pⁿ-1, which takes us back to b⁰ = 1. The powers of y, corresponding to the multiples of r, give all the p^k-1 roots of 1. This is our subfield L inside the larger field F.

As k steps through the divisors of n, c^k fixes ever larger subfields, starting with k = 1, which fixes only the integers, (0 and the p-1 roots of 1), all the way up to k = n, where cⁿ fixes all of F. This is not a tower of subfields however; when n = 6, the subfield of order p² does not live in the subfield of order p³.

[ venn diagram big shape labeled F order p⁶ then two regions inside labeled p³ and p², intersecting in region at the bottom labeled integers p¹ ]

Are there any other subfields? By dimensionality, any subfield L has to have a size p^k, where k divides n. There is at least one such subfield, fixed by c^k. If there is another, then there are more than p^k-1 solutions to the polynomial x to the p^k-1 = 1, which is impossible. Therefore there is one subfield for each divisor of n.

That's one subfield for each cycle of automorphisms within the larger cycle generated by c. This is a wonderful correspondence between the subfields of F and the subgroups of the automorphisms of F. This is galois theory, and we're definitely going to explore this in a later chapter, because it's just too beautiful to miss.

[ hexagon with c⁰ through c⁵ on the vertices, * at c⁰ for the trivial automorphism, edge for the hexagon cycle of length 6, inscribed triangle for cycle of length 3, and a diagonal for cycle of length 2, all connected to c⁰, caption 4 cycles build 4 subfields ]

Let u be any element that generates all of F. Thus u satisfies an irreducible polynomial q of degree n.

The images of u under the automorphisms are u to the p, u to the p² , and so on up to u to the pⁿ, which gives u back again. Raise q(u) to the p power, using the frobenius automorphism, and the coefficients of q do not change. Thus each image of u is a root of q. The n conjugates of u are the n roots of q. Each automorphism performs a circular shift on these roots. Apply c, and these roots shift around by 1.

[ u u² u³ u⁴ u⁵ u⁶ back to u around in a circle with c in the center ]

Let's use this example to dispel a common myth. The conjugates need not span the extension as a vector space. The powers of u span, from 1 to u^n-1, but not the conjugates of u. Adjoin u²+1 to Z/3. Since 2 is not a square, this is irreducible. The "other" square root of 2 is -u, hence u + its conjugate is 0. In other words, u+u³ = 0. The conjugates are not linearly independent, and cannot span the entire vector space of dimension 2.

Also, the adjoined element need not be primitive. In the above example, u⁴ = 1. Thus u does not have order 8, and is not primitive. A primitive element is u+2, whose square is u.

To use a finite field in a real world application, such as error correcting codes, you need a representation of that finite field as an irreducible polynomial mod p. How many irreducible polynomials are there?

A polynomial of degree n is irreducible iff its roots generate all of F, iff its roots do not lie in a smaller subfield. Count the elements in F, but not a subfield of F, and divide by n; that is the number of monic irreducible polynomials mod p.

A maximal proper subfield has order p^k, where n/k is some prime d. An element x may live in none of these subfields, or 1, or 2, or 3, etc. This suggests inclusion exclusion. Here is the formula for the top, for the first few values of n.

p² - p
p³ - p
p⁴ - p²
p⁵ - p
p⁶ - p³ - p² + p
p⁷ - p
p⁸ - p⁴
p⁹ - p³
p¹⁰ - p⁵ - p² + p

[ Same venn diagram as earlier, with F = p⁶ but also label this as the top and shade or color this region ]

The first term, pⁿ, dominates everything else. Divide by n to count the irreducible polynomials. That's pⁿ/n. Since there are pⁿ monic polynomials in total, choose one at random and it will be irreducible with probability 1/n. But how do you know it's irreducible? What is the test?

Review the pseudoprime test and the proof of primality, the procedures here are completely analogous.

Let q(x) be a polynomial that might be irreducible. Let u be a primitive element, some polynomial in x. Thus u generates F^*. Remember that x itself might not be primitive, but there's a good chance it is, so try u = x first. If this fails, just switch to another polynomial; that's faster than setting u to various polynomials in x and doing all the math.

Let m = pⁿ-1. If q is irreducible and x is primitive, then x^m = 1. If this fails then go back and try another polynomial q(x).

If x^m = 1, you can try the same test for other polynomials in x; they should all come out 1. But this really doesn't help very much. There are lots of reducible polynomials that will pass this test every time. Let n = 4 and let q(x) be the product of two different irreducible quadratics r(x) and s(x). Note that p²-1 divides evenly into m. Looking mod r, x to the p²-1 is 1, so x^m = 1. The same thing happens mod s, and by the chinese remainder theorem, x^m = 1 relative to q(x). And this happens for every polynomial that is coprime to r and s. Everything to the m is 1, even though q is reducible. This is a carmichael polynomial, similar to a carmichael number. Here is an example mod 7.

Ok that didn't help; how about a more direct proof of irreducibility. If m can be factored, you can prove primality just as we did for the integers. Let t be a prime dividing m and evaluate x^m/t. If this is 1 then x has a shorter cycle and is not primitive. If it is not 1, for every t, then the period of x is a factor of m, but not a factor of m/t for any prime t, hence the period is m. The powers of x are distinct all the way up to m. Everything in F is invertible, and F is a field. That makes q(x) irreducible, and a valid representation for F. The math is efficient, even if p and q are large. The only tricky part is the factorization of m. This may not be practical if the field is large.

There is another proof of irreducibility, that doesn't require a factorization of pⁿ-1, and doesn't require that x be primitive.

If q(x) is a constant it is a unit, and does not meet the definition of irreducible.

If q is linear, i.e. x-c for some integer c, q is irreducible.

If q, of degree > 1, has no constant term, then q is divisible by x, and is not irreducible.

Next, take the gcd of q(x) with x^p-1-1. All the linear polynomials x-c, multiplied together, create a polynomial whose roots are, precisely, the p-1 roots of 1. At the same time, x^p-1-1 has distinct roots, and they are all p-1 roots of 1. Therefore x^p-1-1 has 1 through p-1 as roots, and splits into p-1 linear pieces x-c for each nonzero integer c. If q(x) is divisible by any linear factor x-c, the gcd will bring this out. Conversely, if q has no such factors, the gcd will be 1.

If q has degree 2 or 3 we are done, for if q is reducible than at least one of the factors is linear, and we already ruled that out. So assume the degree of q is at least 4.

Let L be the finite field of order p^k. For notational convenience let s = p^k-1. L is 0 and all the roots of x^s-1. This polynomial contains plenty of irreducibles of degree k. Suppose there is another one h(x) that is not here. Let u be a root of h(x), and u defines another subfield of size p^k. Two such subfields cannot exist, hence x^s-1 contains every irreducible polynomial of degree k.

In the same way, x^s-1 contains all the linear factors x-c, and all the irreducibles of degree d for every d dividing k, because those subfields are all present in L, and are unique within F.

Take the gcd of q and x^s-1. In one swoop you have pulled out the irreducibles of degree k, or d dividing k, that live in q. If you find any, then q is not irreducible.

Apply this test as k runs from 1 to half the degree of q, and if nothing falls out, then q is irreducible.

k = 1: linear factors
k = 2: quadratic factors
k = 3: cubic factors
…
k = d/2: half the degree of q

q(x)

If q has degree d, you can let k run from d/4+1 to d/2. Suppose q has an irreducible factor of degree j. If j is d/4, it will be found when k = d/2. If j is smaller, it is a divisor of something between d/4 and d/2. This irreducible factor will be extracted at some level k.

Another shortcut tests for linear factors first, k = 1, and remembers g(x) = x^p-1 mod q(x) along the way. At each level k, divide s by p-1. Raise g to the s/(p-1) mod q. This is the same as raising x to the s mod q, so subtract 1 and proceed with the gcd algorithm.

If q has several factors of h, where h is irreducible, the gcd algorithm will pull out only one instance of h. That's because x^s-1 has only one instance of h. That's ok; I'm not trying to find a complete factorization for q, I just want to know if it's irreducible.

The gcd of q and x^s-1 might be q. This could happen if q consists entirely of distinct irreducible factors of degree k. That's ok, it proves q is reducible.

This is efficient, even if p is a thousand digits long. Yes, x^s-1 is a really big polynomial, but the first iteration of the gcd algorithm divides x^s-1 by q, and finds a remainder whose degree is less than that of q. From here the polynomials are manageably small.

Apply the procedure described in the previous section, and assume all the factors of q have degree k.

If q has degree k itself, then q is irreducible. Otherwise it is composite.

Let F be the finite field of order p^k. Let h(x) be a factor of q(x), and let u be a root of h. Let b be primitive in F, and let u = b^j. In other words, j = log(u).

Let g be the gcd of j and p^k-1. Thus g determines the cycle that u lives in. If g = 1 then u is primitive, just like b. If g = 2 then u is a square, and cannot be primitive. If g = 3 then u is a cube, and so on.

Apply the frobenius automorphism, which takes u to its next conjugate u^p. The log is multiplied by p, that is, j → jp. Since p is coprime to p^k-1, the gcd does not change. The conjugate lives in the same cycle as u itself. In fact all the roots of h live in the same cycle. They are all primitive, or all squares, or all cubes, etc.

Let s = (p^k-1)/c, and to keep things simple, let c = 2. Is be a root of x^s-1? No - because b has to go all the way to p^k-1 to reach 1. However, b² is a root of x^s-1. All the powers of b² are also roots, and these are all the roots we are allowed. Therefore, the roots of x^s-1 are the squares in F, and the nonsquares are left on the cuttingroom floor.

Take the gcd of q and x^s-1. If the roots of h are squares, and the roots of some other factor of q are nonsquares, then h is pulled out, and q is reduced.

Other values of c can be used to separate q. Set c = 3, whence s = (p^k-1)/3, and x^s-1 holds all the cubes. Take the gcd of q and x^s-1 to separate the factors whose roots are cubes from the factors whose roots are noncubes. This can be done for any value of c that divides p^k-1. However, I have found that c = 2 is usually sufficient.

At this point we see why p has to be odd. If p = 2 and k = 3 then p^k-1 is 7, which is prime. There is no value of c to partition the units of F, and no way to separate the factors of q.

Even if p is odd, such as 59, q might consist of three factors, all with roots that are squares. And it's very likely that none of these roots are 29^th powers, so that looks like the end of the game, but it's not.

Let x = y+1, and substitute for x in q(x), giving a new polynomial q(y). This is called a shift. If q(y) factors then replace y with x-1 and q(x) factors. The prime factorization of the shift is the shift of the prime factors, but all the roots are shifted by 1. If all the roots were squares to begin with, then the shift will make some of them squares and some of them nonsquares, seemingly at random. The shifted version of q can be factored through x^s-1, and the factors can then be shifted back. If the gcd doesn't yield any fruit, shift again, and again. You can shift up to p times, looking for separation. It almost always works. If not, you could use any linear function such as x = 3y+8, though I have not tried that variation.

There is a catch. If q = h², then the roots shift together, and h is never separated from h. This problem is solve by extracting repeated roots at the outset using formal derivatives. See the next section for more details.

Even if the factors are distinct, it is possible to construct a pathological composite polynomial. This is likely to happen mod 3, because you only have 3 chances to shift the roots. Find 9 different irreducible polynomials of degree 4, (there are (81-9)/3 = 24 to choose from), and the shifted roots of any one of these polynomials are going to be square or nonsquare, 3 times, giving 8 possibilities. Two of them are going to come out the same, and cannot be separated by this method. Here is my example; and it doesn't help to bump c up to 4, 5, or any of the factors of 80, because the roots shift together. It's not a fool-proof algorithm, but almost.

A number with thousands of digits cannot be factored, but a polynomial mod p with thousands of terms can be factored quickly and easily. This is quite a surprise, and a lovely result. My software, made available in the prolog of this book, includes this factorization algorithm.

The routine is recursive, and is called with a polynomial, a multiplicity, a degree, and a shift value. Thus a polynomial is separated into two pieces, and those two pieces are factored, and so on; printing a list of irreducibles at the end.

At the start it is called with a polynomial q(x), a multiplicity of 1, a degree of 0, and a shift value of 0. Please factor q(x)¹, with no shift, looking for factors of any degree.

If q is a constant then return; there are no factors.

If q has no constant term, divide q by a power of x, and put this many factors of x on the list, x being irreducible.

If q is linear, of degree 1, then put it on the list and return; it is irreducible. Remember to tag q with its multiplicity when placing it on the list.

If degree is unspecified and the multiplicity is 1, perform a multiplicity analysis. Set y = q; we will need the original polynomial q later. At the top of the multiplicity loop, let y′ be the formal derivative of y. If this is 0 then all the exponents are divisible by p. By the frobenius homomorphism, y is a smaller polynomial raised to the p power. Divide all the exponents by p, multiply the multiplicity by p, and go back up to the top of the loop.

With y′ nonzero, let g be the gcd of y and y′. If g is 1 then break out of the loop; otherwise set y = g and go back up to the top.

If q is a product of distinct prime factors, then q is not something else to the p power, and q′ will have nothing in common with q. Conversely, assume q is s*h^j, for some maximal polynomial h and multiplicity j > 1. Assume p divides j. If h^j is all of q then q is reduced. If s is nontrivial then q′ = s′*h^j + s*j*h^j-1*h′. The second term goes away, and h^j is common to both. Since s′ is smaller than s, the gcd is proper, and q is reduced. If j is not divisible by p then h^j-1 is common to q and q′, and q′ is smaller than q, hence q is reduced. At the end of the loop, q is reduced by some factor y with high multiplicity, or q consists of distinct prime factors.

If y is different from q then it is a factor. Divide y^e out of q, and recursively invoke the algorithm with y and e. Then jump back up to the multiplicity loop with q/y^e.

If q was not reduced, then it has no multiplicity. Separate it into chunks by degree. If a chunk y of degree k is pulled out, invoke the algorithm with y, and the current multiplicity, and k as degree. Then call the algorithm on the rest of q.

If everything has degree k, but q is still composite, separate by shifting roots, as described in the previous section, then call the algorithm on the two pieces with the shift value, so that the factors thereof can be unshifted.

Once q is factored, you need to run through the list and regroup. Here's why. Let q = s⁸*t⁷ mod 7. The first derivative reduces t⁸ down to t⁷, and that becomes (st)⁷. The next trip through the multiplicity loop reduces (st)⁷ to st. Break out of the loop, with y = st, and ask how many factors of y are in q. There are 7, and once y is factored, s⁷ and t⁷ are on the list. But the quotient q/y⁷ contributes another instance of s to the list. Clean up by putting s and s⁷ together, giving s⁸. We're just tidying things up a bit.

Let F be a finite field of order w, where w is a prime power, and let E be a finite extension of F. For example, let E/F have dimension 4, and select a basis of length 4 to express E as an F vector space. Now every element in E is defined by 4 coefficients a b c and d, relative to this basis.

Let z be a primitive root for E, and consider the norm of z. This is z times its 3 conjugates. The 3 conjugates are z raised to the w, w², and w³. Thus the norm is z to the 1+w+w²+w³. Call this y, and raise y to the w-1 and get z to the w⁴-1, or 1. y is a w-1 root of 1, and y is primitive for F. Put this all together and z generates E^*, and |z| generates F^*.

Apply this to all the powers of z, and they map uniformly onto all the powers of y, which are all the nonzero elements of F. The powers of z wrap around the powers of y like thread around a spool. In fact the map is modular math; the exponent is reduced mod w-1. The distribution is uniform, and everything in F^* is hit 1+w+w²+w³ times.

z⁰	z¹	z²	…	z^w-1	z^w	z^w+1	…	z^2w-2	z^2w-1	z^2w	…
y⁰	y¹	y²	…	y⁰	y¹	y²	…	y⁰	y¹	y²	…

Return to the representation of E as an F vector space. The norm, i.e. the product of conjugates, becomes some formula n(a, b, c, d). Evaluate n at every possible a b c and d, other than a = b = c = d = 0, and you get an even distribution across the nonzero elements of F. Each one is hit 1+w+w²+w³ times.

Apply this to the case of F adjoin the square root of m, where m is not a square in F. Let u² = m, hence the conjugate of u is -u. The norm of a+bu is a²-mb². As long as a and b are not zero, the distribution of a²-mb² is even across F^*. Each element is hit 1+w times.

Instead of the square root of m, adjoin the cube root of 1. This looks like an extension of dimension 3, but it is actually quadratic, with x² = -x-1. Review the Eisenstein integers. The norm is now a²-ab+b², and it spreads evenly when x²+x+1 is irreducible over F. Each element is hit 1+w times.

All good, but what if the extension E/F is not a field? What is the distribution of a²-mb² when m is already a square in F? Sure, you can set E = F[x], polynomials in x over the field F, where x² collapses back to m, but if s is the square root of m in F, then (x-s) * (x+s) = x²-m = 0. E contains zero divisors, and E is not a field.

Still, E gives us a clue to the distribution. Add and multiply in E by adding and multiplying polynomials in x, then replace x² with m. It's just another flavor of modular math. So E is a ring extension, with basis 1 and x, having coefficients a and b drawn from F. The elements of e are F cross F, and the extension is 2 dimensional.

Map E into F by replacing x with s. This respects addition and multiplication, because s² = m, just like x². Everything works the same when x is replaced with s. But of course a+bs lies in F, hence a homomorphism from E into F.

Then there's another homomorphism from E into F via x → -s. Put these two homomorphisms together, side by side, and find a homomorphism from E into F cross F. Yes, it respects addition and multiplication, because it does so into the first copy of F, and the second copy of F.

What is the kernel of this homomorphism? What part of E maps to [0,0] in F cross F? When are a+bs and a-bs both 0? If the characteristic of F is not 2, then solve for a and b and get 0. Only 0 in E maps to [0,0] in F cross F. The map is an embedding. And since E is a 2 dimensional F vector space, and F cross F is a 2 dimensional F vector space, the map is actually an isomorphism. If this proof looks familiar, it should; it is just an instance of the chinese remainder theorem applied to polynomials.

Our expression, a²-mb², is (a+sb) * (a-sb), which is the image of E into the first copy of F, times the image of E into the second copy of F. Well E covers all of F cross F, so as a and b take on all their values, the product becomes everything in F times everything in F. This is 0 when either side is 0; which happens 2w-1 times. On the other hand, each nonzero product c is some nonzero v in the first copy of F, times c/v in the second copy of F. That happens w-1 times, and it happens the same way for every c. The distribution is even across the nonzero elements of F, just as it was when E was a field.

In summary,a²-mb² spreads evenly across F^*. If m is not a square in F, norm distribution tells us there is one instance of 0, with a = b = 0, and everything in F^* is hit 1+w times. As a sanity check, 1 + (w-1)*(1+w) = w²; all the pairs [a,b] are accounted for. Now, if m is already a square in F, 0 is hit 2w-1 times, and everything in F^* is hit w-1 times. Again the math works out: 2w-1 + (w-1)*(w-1) = w².

The distribution of a⁴-mb⁴ is even if w is 3 mod 4. At first this seems mysterious, but when w is 3 mod 4, squaring the squares merely permutes them. We still have all the same combinations of a² and b², in perhaps a different order, and the distribution remains even. In the same way, a⁶-bm⁶ is even if w is 2 mod 3. You can explore these and other distributions with this program.

To be complete, let F have characteristic 2. Take away 0, and there are w-1 units, which is an odd number. If the order of F is 32, there are 31 units. Select a primitive root, and discrete logs will run from 0 to 30. You can always divide a log by 2. Half of 10 is 5, and half of 7 is 19. Thus everything in F has exactly one square root. Given any value c, select any b, and there is a unique a such that a² = c+mb². Every c in F is hit w times, once for each b. The distribution is still even across the nonzero elements of F; in fact this time it is even across all of F.

Here's a fun corollary. Let p be a prime and let m be a nonsquare. Take all the squares mod p, including 0, and subtract m. Since m is a nonsquare, none of these come out 0. If the value is v, scale a by b and m by b^2, giving (ab)^2-mb^2 = vb^2. As long as b is nonzero, ab runs through all the values mod p. When b is 0 things get rather squashed. In this case, just use a. Now we have all the squares minus m times all the squares, and that is uniform. Half are squares and half are nonsquares. Remove a = b = 0, which doesn't participate anyways, then remove the set associated with b = 0, which is a^2, which is p-1 squares. We now have (p^2-1)/2 nonsquares and (p^2-2p+1)/2 squares. Group these together by a = 1, a = 2, a = 3 etc. Each set is multiplied by various values of b^2, but that doesn't change the quadratic residue. Divide through by p-1 and find (p+1)/2 nonsquares and (p-1)/2 squares. Group each nonzero value of a with its opposite, as they produce two copies of the same value a^2-m. The oddball, a = 0, produces a square if p = 3 mod 4, and a nonsquare if p = 1 mod 4. In the first case, take that square away and there are (p-3)/2 squares remaining, thus (p-3)/4 distinct nonzero square values, or (p+1)/4 square values. This agrees with the nonsquare values. There are just as many squares as nonsquares. Subtract m from all the squares mod p, and exactly half will be squares and half will be nonsquares. For example, let p = 11, and the squares are 0, 1, 3, 4, 5, and 9. Add 1 (subtracting -1) and see that 3 out of 6 land on squares. Similarly for mod 19, with squares of 0, 1, 4, 5, 6, 7, 9, 11, 16, 17. In the second case, where p = 1 mod 4, a = 0 contributes 1 nonsquare value. That leaves (p-1)/2 distinct nonzero square values and (p-1)/2 distinct nonsquare values. Bring a = 0 back in and there is exactly one more nonsquare than square. For example, set p = 13, with nonzero squares 1, 3, 4, 9, 10, 12. Since 2 is not a square, subtract 2, and note that 3 out of 6 land on squares. Bring in 0-2 if you like, the last nonsquare.

Now, what happens when m is a square? Exclude a = ±m, then scale everything by b, then replace b = 0 with the stand-alone values of a. There are (p-1)^2/2 squares and nonsquares. Pull away the stand-alone squares as we did before, leaving (p^2-4p+3)/2. Divide by p-1 and find (p-1)/2 nonsquares and (p-3)/2 squares. Account for the oddball a = 0, in one column or the other, as p is 1 or 3 mod 4. If p = 1 mod 4, we have (p-1)/4 squares and nonsquares, but bring back in a = ±m, for a value of 0, for one more square. If p = 3 mod 4 the sets are equal. Example, mod 11, with squares 0, 1, 3, 4, 5, 9, subtract 1 and find 3 squares and 3 nonsquares. You can verify this using this program.

Recall the descending groups of matrices: all matrices, invertible, determinant ±1, orthonormal, and orthonormal with determinant 1 (rotations). When the entries come from a finite field F, the number of matrices is finite. We now have the machinery to calculate the size of each group. Let w be the order of F, where w is a prime power.

All matrices

Invertible

Determinant ±1

Orthonormal

Rotations

If the matrix is n by n, any of w elements can be in any of n² positions, hence there are w to the n² possible matrices.

If M is invertible, build n linearly independent rows, one at a time, starting at the top. The first row can be anything but 0, hence there are wⁿ-1 possibilities. Once selected, the second row can be anything other than a scale multiple of the first. That leaves wⁿ-w possibilities. The third row can be anything other than the space spanned by the first two rows. This space has size w², hence there are wⁿ-w² possibilities. Continue down to the bottom row, which can be filled in wⁿ - w^n-1 ways. Multiply these expressions together to count the invertible matrices.

Pull the powers of w out of each expression, and find w raised to the (n²-n)/2. That leaves (w-1) × (w²-1) × (w³-1) × … (wⁿ-1).

If the determinant is 1 then the last row, the bottom row, has to be scaled by 1/det(M). The scaling factor is anything from 1 to w-1, so throw away w-1 in the previous expression to count matrices with determinant 1.

w^(n×n-n)/2 × (w²-1) × (w³-1) × … (wⁿ-1)

Double this if you want a determinant of ±1. Or just leave it alone if F has characteristic 2, because 1 and -1 are the same.

Orthonormal is trickier, as you might imagine. Approach this by counting vectors of certain lengths. How many vectors are there of length 0, or 1, or q, where q is your favorite nonsquare in F. This is all you need, because a vector of length 1 can be scaled by s to give a vector of length s², and a vector of length q can be scaled to give a vector of length qs². If the counts are z a and b, for 0 1 and q respectively, then z + (w-1)/2 × (a+b) has to equal wⁿ, accounting for all the vectors.

In characteristic 2 you don't have to worry about q, since everything is a square. Thus z + (w-1) × a = wⁿ.

I will use the notation O_n for orthonormal of determinant 1 in n dimensions. Let's start by analyzing O₂.

Let the top row of a matrix in O₂, having length 1, be [a,b]. If either is 0 then the other is ±1. Put ±1 in the opposite corner to complete the matrix. If a and b are nonzero, and if the two rows are to be orthogonal, both entries in the second row are also nonzero. Scale the bottom row so that the lower right entry equals a. (We'll scale it back later if necessary.) The antidiagonal elements must now be opposite. [a,b] and [-b,a]. This has a determinant of a²+b², which is already 1, since the top row has length 1. We didn't need to scale the bottom row after all. The matrices of O₂ correspond to the pairs a,b such that a²+b² = 1. You can think of these as vectors on the unit circle in the complex plane. The top row holds this vector, while the second row, the second vector, always runs 90 degrees ahead of the first. This is only an analogy, since F is a finite field, rather than continuous rotations of the xy plane.

What is the size of O₂? It is the number of vectors of length 1, the number of pairs [a,b] with a²+b² = 1. The previous section tells us the distribution is even across all of F^*, so start with w², take away the pairs that yield 0, and divide by w-1. Of course 0 yields 0, and if any other a²+b² = 0, then a/b is a square root of -1. This happens iff w = 1 mod 4, hence this formula, and many formulas to come, depend on w mod 4. The size of O₂ is w (characteristic 2), w-1 (w = 1 mod 4), and w+1 (w = 3 mod 4). Double this if you want reflections, orthonormal matrices with determinant -1. In this case the bottom row is a vector that runs 90 degrees behind the first vector.

Take arbitrary matrices [a,b],[-b,a] and [c,d],[-d,c] and verify that multiplication commutes, like rotations in the plane. In fact the top row carries all the information; the bottom row just goes along for the ride. This will not be the case in higher dimensions.

To count O_n, consider the orthonormal matrices with a predefined top row r. Let S be one such matrix, the standard matrix. Premultiply by any matrix that is 1 in the upper left and orthonormal in its lower right block of size n-1. The top row is still r, and the resulting matrix is orthonormal. This builds a map from orthonormal matrices of dimension n-1 into orthonormal matrices of dimension n with a top row of r.

Next, reverse this map, thus building a bijection. Take any orthonormal matrix M with top row r and multiply on the right by S inverse, (also known as S transpose), and the result is orthonormal, with 1 in the upper left. Therefore the number of matrices in O_n with top row r is equal to the number of orthonormal matrices of dimension n-1. Of course this assumes a standard matrix S exists.

Look at the case n = 3 over a finite field F of size w. How many top rows r are there? Remember that the length of r is 1. When the upper left entry is ±1, the other two have squares that sum to 0. This contributes 4w-2, or 2, as w is 1 or 3 mod 4. For anything else in the upper left, the remaining two squares sum to something nonzero. We showed earlier that sums of squares are distributed evenly across the nonzero entries of F. This yields w-2 times (w-1 or w+1), as w is 1 or 3 mod 4. Add these up and get w²+w or w²-w, as w is 1 or 3 mod 4. Finally, multiply this expression by the size of O₂, and get w(w-1)(w+1), regardless of w mod 4. That is the size of O₃, but we still have to demonstrate a standard orthonormal matrix S for each row r.

Start with r, as the first vector in n space, and extend this to an n dimensional basis. Scale the bottom row, so that the determinant is 1. Then apply the gram schmidt process to create an orthogonal matrix. The determinant is still 1.

Dot each row with itself, and multiply the results together, and get the square of the determinant, which is 1. Thus an even number of dot products are nonsquares.

Scale each row independently, so that its self dot product is 1 or q, where q is your favorite nonsquare in F. (At this point the determinant might not be 1.) If all dot products are 1 then S is orthonormal, and we're done. Otherwise consider a pair of rows with dot products equal to q. Call these two vectors u and v. Replace u with au+bv, and v with -bu+av, where a and b are arbitrary scalars in F. Dot either of these linear combinations with any other row and the result is still 0. Verify that (au+bv).(-bu+av) = abq - abq = 0. The matrix is still orthogonal. Now evaluate (au+bv).(au+bv) and get (a²+b²)q. Similarly, (-bu+av).(-bu+av) = (a²+b²)q. Since sums of squares are evenly distributed across F^*, there is an a and b such that a²+b² = 1/q. The two rows now have length 1. Do this for every pair of nonsquare rows, and S becomes orthonormal. That completes the proof.

Here is some notation to help us move through higher dimensions. Let o_n be the number of matrices in the group O_n. Let z_n be the number of n dimensional vectors with length 0, and let y_n be the number of n dimensional vectors with length 1. We know that y₂ would be the same for any nonzero length, because a²+b² spreads evenly across F^*. This holds for even indexes, but not for odd indexes. When n is odd, let q_n count the number of vectors whose length is a given nonsquare.

Here is a review of what we know so far. When it matters, w = 1 mod 4 is given first, then w = 3 mod 4.

z₁ = 1, y₁ = 2, q₁ = 0, o₁ = 1.

z₂ = 2w-1 | 1, y₂ = w-1 | w+1, o₂ = w-1 | w+1.

z₃ = w², y₃ = w²+w | w²-w, q₃ = w²-w | w²+w, o₃ = w³-w

When n is even, build z_n in two ways. Let the first n-2 entries have squares that sum to 0, and then bring in the last two entries. This gives z_n-2z₂. Or the first n-2 entries could produce something nonzero, and the last two must take up the slack.

z_n = z_n-2z₂ + (w-1)y_n-2y₂

Similar reasoning holds for y_n.

y_n = z_n-2y₂ + y_n-2z₂ + (w-2)y_n-2y₂

Even though z₂ and y₂ depend on the value of w mod 4, z₄ and y₄ do not.

z₄ = w³+w²-w, y₄ = w³-w, o₄ = w²(w-1)²(w+1)².

Verify that our formula for y₄ comes out the same for a nonsquare length. Like y₂, y₄ is the same across the nonzero lengths. In fact, as you step through n = 6, 8, 10, and so on, q_n always equals y_n. The distribution of vectors is even across the nonzero lengths, though it may still depend on w mod 4.

When n is odd we need some different formulas. Instead of separating the first two entries, make the first entry 0 or nonzero and derive the following.

z_n = z_n-1 + (w-1)y_n-1

y_n = (w-2)y_n-1 + 2z_n-1

q_n = wy_n-1

z₅ = w⁴, y₅ = w⁴+w², q₅ = w⁴-w², o₅ = w⁴(w-1)²(w+1)²(w²+1).

If you move on to dimension 6, use the formulas based on z₄, y₄, z₂, and y₂. In 6, as with 2, the formula for o depends on w mod 4.

For dimension 7, use the "odd" formulas, as we did with 5. Continue this process as far as you like.

z₆ = w⁵+w³-w² | w⁵-w³+w², y₆ = w⁵-w² | w⁵+w², o₆ = y₆o₅

z₇ = w⁶, y₇ = w⁶+w³ | w⁶-w³, q₇ = w⁶-w³ | w⁶+w³, o₇ = y₇o₆

z₈ = w⁷+w⁴-w³, y₈ = w⁷-w³, o₈ = y₈o₇

When w is even, i.e. a power of 2, z_n = y_n = w^n-1, and o_n = w^n×(n-1)/2. This is left as an exercise.

Like a finite field, O₂ has a primitive matrix U that generates all the others.

U, u², u³, u⁴, u⁵, … 1

The top row of a matrix in O₂ is a unit vector [a,b], such that a²+b² = 1. The bottom row is [-b,a], but that just goes along for the ride, and carries no new information.

Multiply two matrices, with top rows [a,b] and [c,d], and get a matrix with top row [ac-bd,ad+bc]. Does this look familiar? It is the formula for multiplication in the complex plane. If i is the square root of -1, then a+bi * c+di = (ac-bd) + (ad+bc)i. It's exactly the same formula.

Assume F has no square root of -1. Thus F[i] is a true field extension, giving a new field E, whose members can be written a+bi. E has a primitive root z, with log 1. Let y = z^w-1, where w is the order of F. Thus log(y) = w-1. The powers of y have logs that are multiples of w-1, all the way up to (w-1) times (w+1), which is w²-1, which is the same as a log of 0, giving 1 in E. Write y as a+bi, and y seeds a cycle of matrices, having a period of length w+1. We need to show that y, and thus the powers of y, are precisely those elements with a²+b² = 1. This is the formula for the norm of E over F. When does an element have norm 1? We showed earlier that the norm of z is z^w+1. The elements with norm 1 are the w+1 roots of 1, which are the powers of y. The matrix associated with y is primitive for O₂.

If F already has a square root of -1, then E = F[i] is isomorphic to two copies of F running in parallel, by the chinese remainder theorem. If (a+bi) times (a-bi) = 1, then apply the two homomorphisms, and the element of F on the left, times the element of F on the right, equals 1. The right is always the inverse of the left, so focus on the left, a+bs, where s is the square root of -1 in F. Multiplication upstairs becomes multiplication downstairs, and F (downstairs) has some primitive root y. The preimage of y, a+bs, written as a+bi, defines the matrix that generates O₂.

Finally let F have characteristic 2. This time the size of O₂ is w, or 2ⁿ. Square a+bi and get 1. That's right, everything is a square root of 1; everything is an involution. And multiplication commutes. Therefore O₂ is the direct product of n copies of Z/2. Yeah, it's weird, but it doesn't come up very often.

Is there an efficient procedure to find unit vectors, i.e. members of O₂? There is, and it works mod n, even if n is a thousand digits long and its factorization is not known.

Choose random values of c, just above the square root of n for efficiency, then ask whether c² mod n is a prime that is 1 mod 4. As soon as you find one, solve for a and b such that a² + b² is equal to this prime. That gives a² + b² = c² mod n. Divide through by c² and relabel the variables so that a² + b² = 1. Given a few such unit vectors, at least one is likely to be primitive, i.e. a generator of O₂.

Here is a brief diversion into continuous math. What does O₂ look like over the reals? The orthonormal vectors lie on the unit circle, which I will place in the complex plane. As shown above, multiplying two matrices in O₂ is the same as multiplying their top-row vectors in the complex plane. By Demoivre's Formula, this adds their angles. Thus O₂ is isomorphic to θ from 0 to 2π, with matrix multiplication corresponding to angle addition. As a topological space, O₂ is a circle.

What about SL₂(R), that is, the 2×2 matrices with determinant 1? This includes O₂, so intuition suggests it is based on a circle.

The geometric way to think about it is to start with 2 vectors u and v in the plane having determinant 1. Spin them around in the plane so that u runs along the positive x axis. This is the parameter θ that defines the circle. Now u is [a,0] and v has to be [?,1/a] to give a determinant of 1. Thus v is in the upper half plane. Its y coordinate is 1/a and its x coordinate is undetermined.

Add a multiple of u to v so that it points straight up the y axis. This is any real number, in other words, a line. So far we have a circle and a line.

This translation, adding a multiple of u to v, does not change the determinant of u and v; the determinant is still 1.

Finally, scale u by 1/a and v by a to get the unit vectors [1,0] and [0,1]. The parameter a is any positive real number, which is a positive ray. Since this ray is open at either end, it is topologically equivalent to a line. Therefore SL₂ is homeomorphic to a circle cross a line cross a line, or a circle cross a plane. If you're a tiny fly, it looks like 3-space. You can jiggle 3 of the 4 entries in the matrix any way you like, and the fourth entry moves in sync to keep the determinant 1. You have 3 dimensions of freedom. It looks like 3-space locally, hence it is a 3-manifold, but if you travel far enough you can come back where you started, tracing the circle of θ. This is not the same as xyz 3-space.

Since orthonormal matrices preserve lengths, they map unit vectors to unit vectors. Multiply u by a matrix M and get another unit vector v. Since M is invertible, the action is a permutation. The product of two matrices induces the first permutation followed by the second, which is the composition of the two permutations. This is an instance of a general topic, groups acting on sets, which is coming soon.

Consider the action of O₂. Let M be a matrix in O₂, hence the top row of M is a unit vector, and the bottom row goes along for the ride. The action of M multiplies the top row by the unit vectors, as though they all lived in the complex plane.

Let U generate all of O₂. Now U shifts all the unit vectors around in a cycle, just as U shifts all the matrices in O₂ up by 1. In fact the top row of these matrices are the unit vectors. This is a cycle on an even number of elements, hence it is an odd permutation. O₂ induces both even and odd permutations on unit vectors, and the even permutations are induced by the squares in O₂, i.e. the even powers of U. We will see that the same thing happens in O₃, but it's more complicated.

[ A circle of 12 dots, with arrows pushing them around counterclockwise, and U above one of the arrows. ]

Let M be the 3 by 3 matrix with U (as above) in the upper left and 1 in the lower right. Thus the order of M is w-1 or w+1, as driven by U in O₂. M is in O₃, and when M acts on the vector [x,y,z], it always maps z to z. If x and y are 0 then M fixes z. (This makes z 1 or -1.) What other vectors are fixed by M?

Assume either x or y is nonzero. If the top row of U is [a,b], then the action of U on [x,y] is [ax-by,ay+bx].

ax-by = x
ay+bx = y

x = by/(a-1) { a is not 1 in U }

ay+b(by/(a-1)) = y { substitute for x in the second equation }

ay+(1-a²)y/(a-1) = y { a² + b² = 1 }

ay-(1+a)y = y

-y = y

y = 0 { assume not characteristic 2 }

ax = x { y = 0 in the first equation }

x = 0 { a is not 1 }

This isn't specific to M; any power of M, that is not the identity matrix (with a = 1), fixes only [0,0,±1].

Imagine a unit vector s that is mapped to the same vector by two different powers of M representing two different matrices in O₃. If s is a vector with x or y nonzero, and sMⁱ = sM^j, then M^i-j fixes s. That can't happen, hence the powers of M map s to distinct vectors. Seen another way, s, sM, (sM)M, (sM²)M, etc are distinct, until we get back to s. M pushes these vectors around in a cycle of length w-1 | w+1. Each of these cycles is an odd permutation.

How many such cycles are there? There are w²+w | w²-w unit vectors, and two of them are fixed by M. Subtract 2 and divide by w-1 | w+1, giving w+2 | w-2. There are an odd number of these cycles, and the permutation is odd. Therefore O₃ induces both odd and even permutations.

Let S be the set of matrices that induce Even permutations, and let T be the set that induce odd permutations. There is a bijection between these two sets; multiply by M, or by M inverse, to go back and forth. Half of O₃ is even and half is odd, as you would expect.

All the squares, i.e. a matrix times itself, belong to S. In fact S consists precisely of squares, but we'll need a counting argument to prove that.

O₃

Squares

Review the monomorphism from the projective quaternions with norm 1 into O₃. How many quaternions have norm 1? These are vectors of length 4 with norm 1, and the answer is w³-w. Since each quaternion is equivalent to its opposite, divide this by 2. And how big is O₃? The answer is once again w³-w. Exactly half of these are images of quaternions with norm 1. At the same time, each one of these has a square root in O₃. Half of O₃ are squares, and these induce even permutations, while the nonsquares induce odd permutations. The squares fill up all of S, leaving the nonsquares for T.

When the base is a finite field, the quaternions are the same as the 2 by 2 matrices. In other words, they are isomorphic. One is merely a relabeling of the other. This is not the case over the reals, or any other ordered field. Recall that the quaternions over the reals form a division ring, wherein everything nonzero is invertible. Yet there are plenty of 2 by 2 matrices that are not 0, and not invertible. Put anything you like in the top row, and [0,0] in the bottom.

Characteristic 2 fails as well. The quaternions commute, but the matrices do not.

We want a ring homomorphism from the quaternions into the 2 by 2 matrices, hence 0 maps to 0 and 1 maps to the identity matrix. Select images for i j and k, and the entire map is determined. Of course i j and k must respect i² = 1, ij = k, and so on. Here is my map, and I'm sure it's not the only one out there.

1 →

1	0
0	1

i →

-1	-1
2	1

j →

v	x
2x-2v	-v

k →

v-2x	v-x
2x	2x-v

Assume 2x² - 2vx + v² + 1 = 0, so that j² = -1.

Verify that i² = -1, j² = -1, ij = k, and ji = -k. It turns out this is all you have to do.

k² = (ij)(ij) = ijij = -iijj = -1

jk = jij = -jji = i

kj = ijj = -i

ik = iij = -j

ki = iji = -jii = j

Given v, can we find x? The discriminant of the quadratic is -4v²-8. We can solve this whenever -v²-2 is a square. Remember that v² covers half the nonzero values in our finite field, along with 0. Negate this set and subtract 2 to find a new set of the same size. This has to cover at least one square. There simply aren't enough nonsquares to accommodate a set of this size. Therefore there is a v, and an x, that brings this homomorphism to life.

The map can be derived mod n, even if n is a thousand digits long and its factorization is not known. Select values of c at random, just below the square root of n for efficiency, until n-c² is a prime that is 1 mod 8. Find a and b such that a²+2b² is this prime. Thus a² + 2b² = -c². Divide through by -b² and relabel the variables to get -a² - 2 = c². Let a play the role of v in the above, and bring the homomorphism to life.

To reverse this map, to go from matrices back to quaternions, you need to invert the following matrix.

1	0	0	1
-1	-1	2	1
v	x	2x-2v	-v
v-2x	v-x	2x	2x-v

The determinant is 4, which is a unit. The inverse is the following matrix divided by 2. This proves the map is an isomorphism. The quaternions and the 2 by 2 matrices are really the same thing.

1	1	-v	2x-v
0	-2	2v-2x	-2x
0	1	-x	x-v
1	-1	v	v-2x

Amazingly, the norm of a quaternion is the determinant of its image. Start with a quaternion a+bi+cj+dk and turn it into the following matrix.

a - b + cv + dv - 2dx	-b + cx + dv - dx
2b + 2cx - 2cv + 2dx	a + b - cv + 2dx - dv

Then use this program to find the determinant, which comes out a² + b² + c² + d².

The quaternions with norm 1 become the matrices with determinant 1. When the base field is finite, we already calculated the size of both groups; they agree at w³-w.

As shown above, there is a solution to x² - 2xv + v² + 1 = 0 mod p. In particular, there is a discriminant z such that z² + v² + 2 = 0. Both variables cannot be 0 simultaneously. Without loss of generality, assume z is nonzero. Extend this equation to p², where v takes on any value mod p² that is compatible with v mod p. z is still correct nod p, but may need to be adjusted to be coorect mod p². For instance, add 3p to z, and you add 6zp to z². Thus the equation is satisfied mod p², and the quaternions and the 2 by 2 matrices are isomorphic mod p².

Continue this process up to p³, p⁴, and so on up to p^k. Thus the quaternions and the 2 by 2 matrices mod p^k are isomorphic.

Do this for several prime powers in parallel, and find an isomorphism mod n by the chinese remainder theorem.

Push the relationship up forever, through each level p^k, until v, z, and x are infinite series of values mod p^k. This establishes the isomorphism over the p-adic numbers. In fact, using these values of x and v, the isomorphism holds for the fraction field.

Finally consider the complex numbers. You can take the square root of anything, so v leads to z, leads to x, leads to an isomorphism.

The algebra that equates norm and determinant is still valid, hence the norm of the quaternion equals the determinant of the corresponding matrix, over any field.

When the underlying field has a topology, like the p-adic numbers or the complex numbers C, then the quaternions over C also have a topology, implied by the direct product of 4 copies of C. The 2 by 2 matrices over C have the same topology. The map in either direction is linear and continuous. A bicontinuous isomorphism is a homeomorphism; the topological rings are the same.

Return to a finite field, and the quaternions with norm 1 are isomorphic (under multiplication) to the 2 by 2 matrices with determinant 1. Equate each quaternion with its opposite, and each matrix with its opposite. These two groups are now isomorphic to the "even" orthonormal matrices of O₃. A theorem in one world carries over to the other two.

Matrices
det 1
/ ±1

⇔

Quaternions
norm 1
/ ±1

⇔

O₃
even

I will close this chapter on finite fields with a very interesting structure, the algebraic closure of the integers mod p.

Call this F, and suppose F is finite. Let b be primitive in F. Since b has no square root, F is not closed. (If p = 2, take some other root of b.) Therefore F is infinite.

In fact F is countably infinite, since the polynomials mod p are countable.

For each integer k, there is but one subfield of F of dimension k. If there were more then there would be too many roots for the polynomial x to the p^k-1 = 1.

The finite subfields of F form a lattice that corresponds to the integers under factorization. The field of dimension 2, and the field of dimension 3, have to live in the field of dimension 6, or a multiple of 6, hence the lcm of the dimensions gives the smallest field containing a set of finite fields. Similarly, the gcd of the dimensions gives the intersection of a set of fields.

Bring in these finite fields one at a time, building an ascending chain of fields. Each has a dimension that is the lcm of the first n integers. Let E be the union of this ascending chain of fields. Clearly E belongs to F. Suppose E is not F. Some u, not in E, is algebraic over E. This makes u algebraic over Z/p. If the dimension of u is n, then u is folded into E at step n or earlier. Therefore E = F.

You can build a proper infinite subfield of F; bring in all the finite fields of even dimension.

Without 0, each subfield is generated by a primitive root. This implies a multiplicative cycle, and these cycles get larger and larger, each containing the previous, as you climb up any chain of subfields in F.

Suppose F^* is generated by a finite set of elements. These elements live in a finite field, and cannot generate all of F.