Add 1 to itself again and again in a finite field and the numbers must eventually return to 0, giving a characteristic of p. Thus Z/p is at the base of every finite field.
In fact Z/p is itself a field, and the underlying reason for this is unique factorization in the integers. For a nonzero x, multiply the nonzero integers mod p by x and you never get 0. Otherwise xy = kp, and p would divide x or y. Similarly, xa ≠ xb, else x*(ab) would be 0. Multiplication by x maps a finite set into itself and is injective. The map becomes a permutation, hence there is some y with xy = 1. This makes x invertible, and Z/p is a field.
In fact there is only one field of order p. The identities 0 and 1 are fixed, and 1+1 has to be 2, and 2+1 has to be 3, and so on until the field is set.
Are there larger finite fields on top of Z/p?
Well you can always extend a field. Adjoint the indeterminant x to get (Z/p)(x), the fractions of polynomials in x with coefficients mod p. This is an interesting field, but it's not finite. If you want a finite field, you need a finite, i.e. algebraic, extension of the base field Z/p. That's not a problem; adjoin anything that is missing, such as the square root of 3 mod 7.
A finite field extension implies a finite basis. If the basis has size n, for an n dimensional extension, then the number of vectors is pn, selecting a coefficient mod p for each basis element. Everything in F is uniquely spanned by this basis, hence the order of F is pn.
There is no field of order 35. There is however a field of order 49. In general, you can adjoin a root u of an irreducible polynomial over a base field K, and K[u] becomes a larger field, whose dimension is the degree of the polynomial. Let u satisfy the irreducible polynomial x23, mod 7, and the dimension is 2, with basis 1 and u. The size is 49. Typical elements are 3+u, 6+5u, 4, 3u, 1+2u, etc.
Math in a finite field is done in the usual way; multiply polynomials together, with coefficients mod p, then collapse the result as directed by the irreducible polynomial.
Let F have order pn, hence there are m = pn1 nonzero elements. Each of these is invertible, hence a unit. They cannot be zero divisors. Thus multiplication by x permutes the nonzero elements of F. From here the proof of Fermat's little theorem applies, almost word for word. Let t be the product of all the units. Replace each v with x*v and multiply them all again. The result is t*xm. Thus every nonzero x, raised to the pn1, is 1 in F.
The proof of the primitive root is once again the same. Replace p1 with m = pn1, and let d loop over all the divisors of m. There is some element with order m. Actually there are φ(m) elements with order m. Select any one of these, call it b, and b is a primitive root of F. As such, b can be used as a base for discrete logs.
Can we tell at a glance whether 1 and 2 are squares in F, similar to the integers? The answer is yes, and the criteria are the same.
If 1 is y2, then y4 = 1. The discrete log of y, times 4, is pn1, hence pn1 is a multiple of 4. This can be reversed to find y, the square root of 1. Therefore 1 is a square in F iff the order of F is 1 mod 4. This happens iff p is 1 mod 4, or p = 3 mod 4 and the dimension of F is even.
In characteristic 2, 1 = 1, and 1 is the square root of 1, so technically 1 is a square.
If p = 1 or 7 mod 8 then 2 is a square in Z/p, and it remains a square in F.
Assume p = 3 or 5 mod 8, and the dimension of F is odd. If 2 has a square root in F then adjoin it and find an extension of dimension 2. This cannot live in a larger extension of odd dimension; therefore 2 is not a square.
Finally let p = 3 or 5 mod 8 and let the dimension of F be even. Let m = pn1, so that the logs run from 0 to m1. Since p to an even power is 1 mod 8, m is divisible by 8. Let y have a log of m/(p1). Raise y to the p1 and get 1, and all the powers of y are distinct, hence they are the roots of xp11 = 0, and they are the integers. In fact y is primitive for the integers mod p. Since p1 is 2 or 4 mod 8, the log of y is even. The log of every integer is even, and every integer, including 2, has a square root.
If w is the order of F, the rules are the same as they were mod p: 1 is a square iff w = 1 mod 4, and 2 is a square iff w = 1 or 7 mod 8.
There is but one field, up to isomorphism, for each prime p. This was described at the start of this chapter. In the same way, there is but one field for each prime power. Adjoin the square root of 3 to the integers mod 7, or the square root of 5. There is an isomorphism between these two fields of order 49; they are essentially the same.
Let F have order pn, with m = pn1 units. As shown above, the units are the mth roots of 1. Therefore F is the splitting field for xm1 over the integers mod p. This is the unique characterization of F.
If you want to be technical, factor xm1 into irreducibles, and select the largest irreducible, ordered in any way you wish. Adjoin u, the root of this polynomial. This defines the next finite field. If it is not all of F then factor xm1 over (Z/p)(u), giving xu along with other irreducibles. Once again select the largest irreducible, and let v be a root. Adjoin v, which builds the next field extension. Continue along this path until you reach F. Any field of order pn has to have this structure. Given two such fields, equate 1 with 1, and u with u, and v with v, and so on to build the isomorphism.
This proves the field is unique, but does it exist?
Let r(x) = xm1, and use formal derivatives to look for any repeated roots. The derivative is mxm1, which has a root of 0. This cannot be a root of r(x), hence all the roots of r(x) are distinct.
Adjoin these roots to Z/p, and call the resulting field F. With 0 included, the order of F is at least m+1 = pn. This includes the integers, since any integer to the p1 is 1, and p1 divides m.
Let c be the frobenius homomorphism c(x) = xp. Every field homomorphism is injective, and since the field is finite, with dimension no larger then m!, c is also surjective, hence a field automorphism.
Let d = cn. Thus d is an automorphism, the result of applying c n times. Let L be the subring of F that is fixed by d. If d fixes x, and y is x inverse, then write xy = 1 and apply d, whence d(y) has to equal y. The subring L is a field.
By construction, x lies in L iff xm+1 = x. This is 0 and the m roots of 1, i.e. 0 and the roots of r(x) Therefore L is a finite field of order pn. In fact F, built from the roots of r(x), = L.
If a field L is finite, of order pr, there is exactly one n dimensional extension up to isomorphism, namely the finite field of order prn.
You might be wondering if this works for an extension of dimension p. Well there is nothing in the above proof that precludes it. Of course you can't adjoin a pth root of an integer c, since xpc is not irreducible; in fact it is (xc)p. But there is some irreducible polynomial of degree p.
Consider p = 2, and extend the integers mod 2 up to a field of order 4. The recipe says to adjoin the cube roots of 1. In other words, adjoin x31, or an irreducible factor thereof. With 1 = 1, we have the following factorization.
x3 + 1 = (x2 + x + 1) * (x + 1)
The first factor is irreducible of degree 2, and generates the finite field of order 4.
Let F be a finite field of order pn, and let c be the frobenius homomorphism c(x) = xp. Since the domain is a field, c is a monomorphism, and since c maps a finite set into itself, c becomes an automorphism.
Repeat c again and again, denoted ck, to get other automorphisms, until cn raises each x to the pn, which is just x, whence cn is the identity map. After you apply c n times, you are back to start.
Here is another way to view it. Let b be a primitive root of F, so that the powers of b run through all the units of F. Take discrete logs, and the log of b is 1, and the powers of b step through the integers 1 through pn1, which is back to 0, that is, b0 = 1. Given k < n, b is raised to the pk, which lands on something other than b. You don't get back to a trivial automorphism until cn. Thus there are n distinct automorphisms, forming a cycle of length n.
c, c2, c3, c4, … cn = 1
Are there any other automorphisms? Let an automorphism map b to bj, whence this same automorphism has to raise every power of b to the j. In other words, it multiplies the log by j. If j is divisible by p then it includes an instance of c, the frobenius automorphism. Apply c first if you like, then raise everything to the j/p. Therefore we can assume j is not divisible by p.
Since j is only meaningful mod m, where m = pn1, we can assume j is strictly between 1 and m. For every x, (x+1)j = xj + 1j. This becomes a polynomial of degree ≤ m2, with lead term jxj1. There can be at most m2 solutions, yet we require m+1 solutions. This is a contradiction, hence all the automorphisms are generated by c as described above.
The subfields are determined by the automorphisms. Let k be a factor of n, and let L be the subfield fixed by ck. Thus x is in L iff x raised to the pk gives x back again. This means x = 0 or x is a pk1 root of 1. Look at these roots of 1 by taking discrete logs base b. Let r = (pn1)/(pk1). If y = br, then log(y) = r. Any multiple of r, times pk1, is a multiple of pn1, which takes us back to b0 = 1. The powers of y, corresponding to the multiples of r, give all the pk1 roots of 1. This is our subfield L inside the larger field F.
As k steps through the divisors of n, ck fixes ever larger subfields, starting with k = 1, which fixes only the integers, (0 and the p1 roots of 1), all the way up to k = n, where cn fixes all of F. This is not a tower of subfields however; when n = 6, the subfield of order p2 does not live in the subfield of order p3.
[ venn diagram big shape labeled F order p6 then two regions inside labeled p3 and p2, intersecting in region at the bottom labeled integers p1 ]
Are there any other subfields? By dimensionality, any subfield L has to have a size pk, where k divides n. There is at least one such subfield, fixed by ck. If there is another, then there are more than pk1 solutions to the polynomial x to the pk1 = 1, which is impossible. Therefore there is one subfield for each divisor of n.
That's one subfield for each cycle of automorphisms within the larger cycle generated by c. This is a wonderful correspondence between the subfields of F and the subgroups of the automorphisms of F. This is galois theory, and we're definitely going to explore this in a later chapter, because it's just too beautiful to miss.
[ hexagon with c0 through c5 on the vertices, * at c0 for the trivial automorphism, edge for the hexagon cycle of length 6, inscribed triangle for cycle of length 3, and a diagonal for cycle of length 2, all connected to c0, caption 4 cycles build 4 subfields ]
Let u be any element that generates all of F. Thus u satisfies an irreducible polynomial q of degree n.
The images of u under the automorphisms are u to the p, u to the p2 , and so on up to u to the pn, which gives u back again. Raise q(u) to the p power, using the frobenius automorphism, and the coefficients of q do not change. Thus each image of u is a root of q. The n conjugates of u are the n roots of q. Each automorphism performs a circular shift on these roots. Apply c, and these roots shift around by 1.
[ u u2 u3 u4 u5 u6 back to u around in a circle with c in the center ]
Let's use this example to dispel a common myth. The conjugates need not span the extension as a vector space. The powers of u span, from 1 to un1, but not the conjugates of u. Adjoin u2+1 to Z/3. Since 2 is not a square, this is irreducible. The "other" square root of 2 is u, hence u + its conjugate is 0. In other words, u+u3 = 0. The conjugates are not linearly independent, and cannot span the entire vector space of dimension 2.
Also, the adjoined element need not be primitive. In the above example, u4 = 1. Thus u does not have order 8, and is not primitive. A primitive element is u+2, whose square is u.
To use a finite field in a real world application, such as error correcting codes, you need a representation of that finite field as an irreducible polynomial mod p. How many irreducible polynomials are there?
A polynomial of degree n is irreducible iff its roots generate all of F, iff its roots do not lie in a smaller subfield. Count the elements in F, but not a subfield of F, and divide by n; that is the number of monic irreducible polynomials mod p.
A maximal proper subfield has order pk, where n/k is some prime d. An element x may live in none of these subfields, or 1, or 2, or 3, etc. This suggests inclusion exclusion. Here is the formula for the top, for the first few values of n.
[ Same venn diagram as earlier, with F = p6 but also label this as the top and shade or color this region ]
The first term, pn, dominates everything else. Divide by n to count the irreducible polynomials. That's pn/n. Since there are pn monic polynomials in total, choose one at random and it will be irreducible with probability 1/n. But how do you know it's irreducible? What is the test?
Review the pseudoprime test and the proof of primality, the procedures here are completely analogous.
Let q(x) be a polynomial that might be irreducible. Let u be a primitive element, some polynomial in x. Thus u generates F*. Remember that x itself might not be primitive, but there's a good chance it is, so try u = x first. If this fails, just switch to another polynomial; that's faster than setting u to various polynomials in x and doing all the math.
Let m = pn1. If q is irreducible and x is primitive, then xm = 1. If this fails then go back and try another polynomial q(x).
If xm = 1, you can try the same test for other polynomials in x; they should all come out 1. But this really doesn't help very much. There are lots of reducible polynomials that will pass this test every time. Let n = 4 and let q(x) be the product of two different irreducible quadratics r(x) and s(x). Note that p21 divides evenly into m. Looking mod r, x to the p21 is 1, so xm = 1. The same thing happens mod s, and by the chinese remainder theorem, xm = 1 relative to q(x). And this happens for every polynomial that is coprime to r and s. Everything to the m is 1, even though q is reducible. This is a carmichael polynomial, similar to a carmichael number. Here is an example mod 7.
Ok that didn't help; how about a more direct proof of irreducibility. If m can be factored, you can prove primality just as we did for the integers. Let t be a prime dividing m and evaluate xm/t. If this is 1 then x has a shorter cycle and is not primitive. If it is not 1, for every t, then the period of x is a factor of m, but not a factor of m/t for any prime t, hence the period is m. The powers of x are distinct all the way up to m. Everything in F is invertible, and F is a field. That makes q(x) irreducible, and a valid representation for F. The math is efficient, even if p and q are large. The only tricky part is the factorization of m. This may not be practical if the field is large.
There is another proof of irreducibility, that doesn't require a factorization of pn1, and doesn't require that x be primitive.
If q(x) is a constant it is a unit, and does not meet the definition of irreducible.
If q is linear, i.e. xc for some integer c, q is irreducible.
If q, of degree > 1, has no constant term, then q is divisible by x, and is not irreducible.
Next, take the gcd of q(x) with xp11. All the linear polynomials xc, multiplied together, create a polynomial whose roots are, precisely, the p1 roots of 1. At the same time, xp11 has distinct roots, and they are all p1 roots of 1. Therefore xp11 has 1 through p1 as roots, and splits into p1 linear pieces xc for each nonzero integer c. If q(x) is divisible by any linear factor xc, the gcd will bring this out. Conversely, if q has no such factors, the gcd will be 1.
If q has degree 2 or 3 we are done, for if q is reducible than at least one of the factors is linear, and we already ruled that out. So assume the degree of q is at least 4.
Let L be the finite field of order pk. For notational convenience let s = pk1. L is 0 and all the roots of xs1. This polynomial contains plenty of irreducibles of degree k. Suppose there is another one h(x) that is not here. Let u be a root of h(x), and u defines another subfield of size pk. Two such subfields cannot exist, hence xs1 contains every irreducible polynomial of degree k.
In the same way, xs1 contains all the linear factors xc, and all the irreducibles of degree d for every d dividing k, because those subfields are all present in L, and are unique within F.
Take the gcd of q and xs1. In one swoop you have pulled out the irreducibles of degree k, or d dividing k, that live in q. If you find any, then q is not irreducible.
Apply this test as k runs from 1 to half the degree of q, and if nothing falls out, then q is irreducible.
k = 1: linear factors
k = 2: quadratic factors k = 3: cubic factors … k = d/2: half the degree of q  q(x) 
If q has degree d, you can let k run from d/4+1 to d/2. Suppose q has an irreducible factor of degree j. If j is d/4, it will be found when k = d/2. If j is smaller, it is a divisor of something between d/4 and d/2. This irreducible factor will be extracted at some level k.
Another shortcut tests for linear factors first, k = 1, and remembers g(x) = xp1 mod q(x) along the way. At each level k, divide s by p1. Raise g to the s/(p1) mod q. This is the same as raising x to the s mod q, so subtract 1 and proceed with the gcd algorithm.
If q has several factors of h, where h is irreducible, the gcd algorithm will pull out only one instance of h. That's because xs1 has only one instance of h. That's ok; I'm not trying to find a complete factorization for q, I just want to know if it's irreducible.
The gcd of q and xs1 might be q. This could happen if q consists entirely of distinct irreducible factors of degree k. That's ok, it proves q is reducible.
This is efficient, even if p is a thousand digits long. Yes, xs1 is a really big polynomial, but the first iteration of the gcd algorithm divides xs1 by q, and finds a remainder whose degree is less than that of q. From here the polynomials are manageably small.
Apply the procedure described in the previous section, and assume all the factors of q have degree k.
If q has degree k itself, then q is irreducible. Otherwise it is composite.
Let F be the finite field of order pk. Let h(x) be a factor of q(x), and let u be a root of h. Let b be primitive in F, and let u = bj. In other words, j = log(u).
Let g be the gcd of j and pk1. Thus g determines the cycle that u lives in. If g = 1 then u is primitive, just like b. If g = 2 then u is a square, and cannot be primitive. If g = 3 then u is a cube, and so on.
Apply the frobenius automorphism, which takes u to its next conjugate up. The log is multiplied by p, that is, j → jp. Since p is coprime to pk1, the gcd does not change. The conjugate lives in the same cycle as u itself. In fact all the roots of h live in the same cycle. They are all primitive, or all squares, or all cubes, etc.
Let s = (pk1)/c, and to keep things simple, let c = 2. Is be a root of xs1? No  because b has to go all the way to pk1 to reach 1. However, b2 is a root of xs1. All the powers of b2 are also roots, and these are all the roots we are allowed. Therefore, the roots of xs1 are the squares in F, and the nonsquares are left on the cuttingroom floor.
Take the gcd of q and xs1. If the roots of h are squares, and the roots of some other factor of q are nonsquares, then h is pulled out, and q is reduced.
Other values of c can be used to separate q. Set c = 3, whence s = (pk1)/3, and xs1 holds all the cubes. Take the gcd of q and xs1 to separate the factors whose roots are cubes from the factors whose roots are noncubes. This can be done for any value of c that divides pk1. However, I have found that c = 2 is usually sufficient.
At this point we see why p has to be odd. If p = 2 and k = 3 then pk1 is 7, which is prime. There is no value of c to partition the units of F, and no way to separate the factors of q.
Even if p is odd, such as 59, q might consist of three factors, all with roots that are squares. And it's very likely that none of these roots are 29th powers, so that looks like the end of the game, but it's not.
Let x = y+1, and substitute for x in q(x), giving a new polynomial q(y). This is called a shift. If q(y) factors then replace y with x1 and q(x) factors. The prime factorization of the shift is the shift of the prime factors, but all the roots are shifted by 1. If all the roots were squares to begin with, then the shift will make some of them squares and some of them nonsquares, seemingly at random. The shifted version of q can be factored through xs1, and the factors can then be shifted back. If the gcd doesn't yield any fruit, shift again, and again. You can shift up to p times, looking for separation. It almost always works. If not, you could use any linear function such as x = 3y+8, though I have not tried that variation.
There is a catch. If q = h2, then the roots shift together, and h is never separated from h. This problem is solve by extracting repeated roots at the outset using formal derivatives. See the next section for more details.
Even if the factors are distinct, it is possible to construct a pathological composite polynomial. This is likely to happen mod 3, because you only have 3 chances to shift the roots. Find 9 different irreducible polynomials of degree 4, (there are (819)/3 = 24 to choose from), and the shifted roots of any one of these polynomials are going to be square or nonsquare, 3 times, giving 8 possibilities. Two of them are going to come out the same, and cannot be separated by this method. Here is my example; and it doesn't help to bump c up to 4, 5, or any of the factors of 80, because the roots shift together. It's not a foolproof algorithm, but almost.
A number with thousands of digits cannot be factored, but a polynomial mod p with thousands of terms can be factored quickly and easily. This is quite a surprise, and a lovely result. My software, made available in the prolog of this book, includes this factorization algorithm.
The routine is recursive, and is called with a polynomial, a multiplicity, a degree, and a shift value. Thus a polynomial is separated into two pieces, and those two pieces are factored, and so on; printing a list of irreducibles at the end.
At the start it is called with a polynomial q(x), a multiplicity of 1, a degree of 0, and a shift value of 0. Please factor q(x)1, with no shift, looking for factors of any degree.
If q is a constant then return; there are no factors.
If q has no constant term, divide q by a power of x, and put this many factors of x on the list, x being irreducible.
If q is linear, of degree 1, then put it on the list and return; it is irreducible. Remember to tag q with its multiplicity when placing it on the list.
If degree is unspecified and the multiplicity is 1, perform a multiplicity analysis. Set y = q; we will need the original polynomial q later. At the top of the multiplicity loop, let y′ be the formal derivative of y. If this is 0 then all the exponents are divisible by p. By the frobenius homomorphism, y is a smaller polynomial raised to the p power. Divide all the exponents by p, multiply the multiplicity by p, and go back up to the top of the loop.
With y′ nonzero, let g be the gcd of y and y′. If g is 1 then break out of the loop; otherwise set y = g and go back up to the top.
If q is a product of distinct prime factors, then q is not something else to the p power, and q′ will have nothing in common with q. Conversely, assume q is s*hj, for some maximal polynomial h and multiplicity j > 1. Assume p divides j. If hj is all of q then q is reduced. If s is nontrivial then q′ = s′*hj + s*j*hj1*h′. The second term goes away, and hj is common to both. Since s′ is smaller than s, the gcd is proper, and q is reduced. If j is not divisible by p then hj1 is common to q and q′, and q′ is smaller than q, hence q is reduced. At the end of the loop, q is reduced by some factor y with high multiplicity, or q consists of distinct prime factors.
If y is different from q then it is a factor. Divide ye out of q, and recursively invoke the algorithm with y and e. Then jump back up to the multiplicity loop with q/ye.
If q was not reduced, then it has no multiplicity. Separate it into chunks by degree. If a chunk y of degree k is pulled out, invoke the algorithm with y, and the current multiplicity, and k as degree. Then call the algorithm on the rest of q.
If everything has degree k, but q is still composite, separate by shifting roots, as described in the previous section, then call the algorithm on the two pieces with the shift value, so that the factors thereof can be unshifted.
Once q is factored, you need to run through the list and regroup. Here's why. Let q = s8*t7 mod 7. The first derivative reduces t8 down to t7, and that becomes (st)7. The next trip through the multiplicity loop reduces (st)7 to st. Break out of the loop, with y = st, and ask how many factors of y are in q. There are 7, and once y is factored, s7 and t7 are on the list. But the quotient q/y7 contributes another instance of s to the list. Clean up by putting s and s7 together, giving s8. We're just tidying things up a bit.
Let F be a finite field of order w, where w is a prime power, and let E be a finite extension of F. For example, let E/F have dimension 4, and select a basis of length 4 to express E as an F vector space. Now every element in E is defined by 4 coefficients a b c and d, relative to this basis.
Let z be a primitive root for E, and consider the norm of z. This is z times its 3 conjugates. The 3 conjugates are z raised to the w, w2, and w3. Thus the norm is z to the 1+w+w2+w3. Call this y, and raise y to the w1 and get z to the w41, or 1. y is a w1 root of 1, and y is primitive for F. Put this all together and z generates E*, and z generates F*.
Apply this to all the powers of z, and they map uniformly onto all the powers of y, which are all the nonzero elements of F. The powers of z wrap around the powers of y like thread around a spool. In fact the map is modular math; the exponent is reduced mod w1. The distribution is uniform, and everything in F* is hit 1+w+w2+w3 times.
z0  z1  z2  …  zw1  zw  zw+1  …  z2w2  z2w1  z2w  … 
y0  y1  y2  …  y0  y1  y2  …  y0  y1  y2  … 
Return to the representation of E as an F vector space. The norm, i.e. the product of conjugates, becomes some formula n(a, b, c, d). Evaluate n at every possible a b c and d, other than a = b = c = d = 0, and you get an even distribution across the nonzero elements of F. Each one is hit 1+w+w2+w3 times.
Apply this to the case of F adjoin the square root of m, where m is not a square in F. Let u2 = m, hence the conjugate of u is u. The norm of a+bu is a2mb2. As long as a and b are not zero, the distribution of a2mb2 is even across F*. Each element is hit 1+w times.
Instead of the square root of m, adjoin the cube root of 1. This looks like an extension of dimension 3, but it is actually quadratic, with x2 = x1. Review the Eisenstein integers. The norm is now a2ab+b2, and it spreads evenly when x2+x+1 is irreducible over F. Each element is hit 1+w times.
All good, but what if the extension E/F is not a field? What is the distribution of a2mb2 when m is already a square in F? Sure, you can set E = F[x], polynomials in x over the field F, where x2 collapses back to m, but if s is the square root of m in F, then (xs) * (x+s) = x2m = 0. E contains zero divisors, and E is not a field.
Still, E gives us a clue to the distribution. Add and multiply in E by adding and multiplying polynomials in x, then replace x2 with m. It's just another flavor of modular math. So E is a ring extension, with basis 1 and x, having coefficients a and b drawn from F. The elements of e are F cross F, and the extension is 2 dimensional.
Map E into F by replacing x with s. This respects addition and multiplication, because s2 = m, just like x2. Everything works the same when x is replaced with s. But of course a+bs lies in F, hence a homomorphism from E into F.
Then there's another homomorphism from E into F via x → s. Put these two homomorphisms together, side by side, and find a homomorphism from E into F cross F. Yes, it respects addition and multiplication, because it does so into the first copy of F, and the second copy of F.
What is the kernel of this homomorphism? What part of E maps to [0,0] in F cross F? When are a+bs and abs both 0? If the characteristic of F is not 2, then solve for a and b and get 0. Only 0 in E maps to [0,0] in F cross F. The map is an embedding. And since E is a 2 dimensional F vector space, and F cross F is a 2 dimensional F vector space, the map is actually an isomorphism. If this proof looks familiar, it should; it is just an instance of the chinese remainder theorem applied to polynomials.
Our expression, a2mb2, is (a+sb) * (asb), which is the image of E into the first copy of F, times the image of E into the second copy of F. Well E covers all of F cross F, so as a and b take on all their values, the product becomes everything in F times everything in F. This is 0 when either side is 0; which happens 2w1 times. On the other hand, each nonzero product c is some nonzero v in the first copy of F, times c/v in the second copy of F. That happens w1 times, and it happens the same way for every c. The distribution is even across the nonzero elements of F, just as it was when E was a field.
In summary,a2mb2 spreads evenly across F*. If m is not a square in F, norm distribution tells us there is one instance of 0, with a = b = 0, and everything in F* is hit 1+w times. As a sanity check, 1 + (w1)*(1+w) = w2; all the pairs [a,b] are accounted for. Now, if m is already a square in F, 0 is hit 2w1 times, and everything in F* is hit w1 times. Again the math works out: 2w1 + (w1)*(w1) = w2.
The distribution of a4mb4 is even if w is 3 mod 4. At first this seems mysterious, but when w is 3 mod 4, squaring the squares merely permutes them. We still have all the same combinations of a2 and b2, in perhaps a different order, and the distribution remains even. In the same way, a6bm6 is even if w is 2 mod 3. You can explore these and other distributions with this program.
To be complete, let F have characteristic 2. Take away 0, and there are w1 units, which is an odd number. If the order of F is 32, there are 31 units. Select a primitive root, and discrete logs will run from 0 to 30. You can always divide a log by 2. Half of 10 is 5, and half of 7 is 19. Thus everything in F has exactly one square root. Given any value c, select any b, and there is a unique a such that a2 = c+mb2. Every c in F is hit w times, once for each b. The distribution is still even across the nonzero elements of F; in fact this time it is even across all of F.
Recall the descending groups of matrices: all matrices, invertible, determinant ±1, orthonormal, and orthonormal with determinant 1 (rotations). When the entries come from a finite field F, the number of matrices is finite. We now have the machinery to calculate the size of each group. Let w be the order of F, where w is a prime power. 

If the matrix is n by n, any of w elements can be in any of n2 positions, hence there are w to the n2 possible matrices.
If M is invertible, build n linearly independent rows, one at a time, starting at the top. The first row can be anything but 0, hence there are wn1 possibilities. Once selected, the second row can be anything other than a scale multiple of the first. That leaves wnw possibilities. The third row can be anything other than the space spanned by the first two rows. This space has size w2, hence there are wnw2 possibilities. Continue down to the bottom row, which can be filled in wn  wn1 ways. Multiply these expressions together to count the invertible matrices.
Pull the powers of w out of each expression, and find w raised to the (n2n)/2. That leaves (w1) × (w21) × (w31) × … (wn1).
If the determinant is 1 then the last row, the bottom row, has to be scaled by 1/det(M). The scaling factor is anything from 1 to w1, so throw away w1 in the previous expression to count matrices with determinant 1.
w(n×nn)/2 × (w21) × (w31) × … (wn1)
Double this if you want a determinant of ±1. Or just leave it alone if F has characteristic 2, because 1 and 1 are the same.
Orthonormal is trickier, as you might imagine. Approach this by counting vectors of certain lengths. How many vectors are there of length 0, or 1, or q, where q is your favorite nonsquare in F. This is all you need, because a vector of length 1 can be scaled by s to give a vector of length s2, and a vector of length q can be scaled to give a vector of length qs2. If the counts are z a and b, for 0 1 and q respectively, then z + (w1)/2 × (a+b) has to equal wn, accounting for all the vectors.
In characteristic 2 you don't have to worry about q, since everything is a square. Thus z + (w1) × a = wn.
I will use the notation On for orthonormal of determinant 1 in n dimensions. Let's start by analyzing O2.
Let the top row of a matrix in O2, having length 1, be [a,b]. If either is 0 then the other is ±1. Put ±1 in the opposite corner to complete the matrix. If a and b are nonzero, and if the two rows are to be orthogonal, both entries in the second row are also nonzero. Scale the bottom row so that the lower right entry equals a. (We'll scale it back later if necessary.) The antidiagonal elements must now be opposite. [a,b] and [b,a]. This has a determinant of a2+b2, which is already 1, since the top row has length 1. We didn't need to scale the bottom row after all. The matrices of O2 correspond to the pairs a,b such that a2+b2 = 1. You can think of these as vectors on the unit circle in the complex plane. The top row holds this vector, while the second row, the second vector, always runs 90 degrees ahead of the first. This is only an analogy, since F is a finite field, rather than continuous rotations of the xy plane.
What is the size of O2? It is the number of vectors of length 1, the number of pairs [a,b] with a2+b2 = 1. The previous section tells us the distribution is even across all of F*, so start with w2, take away the pairs that yield 0, and divide by w1. Of course 0 yields 0, and if any other a2+b2 = 0, then a/b is a square root of 1. This happens iff w = 1 mod 4, hence this formula, and many formulas to come, depend on w mod 4. The size of O2 is w (characteristic 2), w1 (w = 1 mod 4), and w+1 (w = 3 mod 4). Double this if you want reflections, orthonormal matrices with determinant 1. In this case the bottom row is a vector that runs 90 degrees behind the first vector.
Take arbitrary matrices [a,b],[b,a] and [c,d],[d,c] and verify that multiplication commutes, like rotations in the plane. In fact the top row carries all the information; the bottom row just goes along for the ride. This will not be the case in higher dimensions.
To count On, consider the orthonormal matrices with a predefined top row r. Let S be one such matrix, the standard matrix. Premultiply by any matrix that is 1 in the upper left and orthonormal in its lower right block of size n1. The top row is still r, and the resulting matrix is orthonormal. This builds a map from orthonormal matrices of dimension n1 into orthonormal matrices of dimension n with a top row of r.
Next, reverse this map, thus building a bijection. Take any orthonormal matrix M with top row r and multiply on the right by S inverse, (also known as S transpose), and the result is orthonormal, with 1 in the upper left. Therefore the number of matrices in On with top row r is equal to the number of orthonormal matrices of dimension n1. Of course this assumes a standard matrix S exists.
Look at the case n = 3 over a finite field F of size w. How many top rows r are there? Remember that the length of r is 1. When the upper left entry is ±1, the other two have squares that sum to 0. This contributes 4w2, or 2, as w is 1 or 3 mod 4. For anything else in the upper left, the remaining two squares sum to something nonzero. We showed earlier that sums of squares are distributed evenly across the nonzero entries of F. This yields w2 times (w1 or w+1), as w is 1 or 3 mod 4. Add these up and get w2+w or w2w, as w is 1 or 3 mod 4. Finally, multiply this expression by the size of O2, and get w(w1)(w+1), regardless of w mod 4. That is the size of O3, but we still have to demonstrate a standard orthonormal matrix S for each row r.
Start with r, as the first vector in n space, and extend this to an n dimensional basis. Scale the bottom row, so that the determinant is 1. Then apply the gram schmidt process to create an orthogonal matrix. The determinant is still 1.
Dot each row with itself, and multiply the results together, and get the square of the determinant, which is 1. Thus an even number of dot products are nonsquares.
Scale each row independently, so that its self dot product is 1 or q, where q is your favorite nonsquare in F. (At this point the determinant might not be 1.) If all dot products are 1 then S is orthonormal, and we're done. Otherwise consider a pair of rows with dot products equal to q. Call these two vectors u and v. Replace u with au+bv, and v with bu+av, where a and b are arbitrary scalars in F. Dot either of these linear combinations with any other row and the result is still 0. Verify that (au+bv).(bu+av) = abq  abq = 0. The matrix is still orthogonal. Now evaluate (au+bv).(au+bv) and get (a2+b2)q. Similarly, (bu+av).(bu+av) = (a2+b2)q. Since sums of squares are evenly distributed across F*, there is an a and b such that a2+b2 = 1/q. The two rows now have length 1. Do this for every pair of nonsquare rows, and S becomes orthonormal. That completes the proof.
Here is some notation to help us move through higher dimensions. Let on be the number of matrices in the group On. Let zn be the number of n dimensional vectors with length 0, and let yn be the number of n dimensional vectors with length 1. We know that y2 would be the same for any nonzero length, because a2+b2 spreads evenly across F*. This holds for even indexes, but not for odd indexes. When n is odd, let qn count the number of vectors whose length is a given nonsquare.
Here is a review of what we know so far. When it matters, w = 1 mod 4 is given first, then w = 3 mod 4.
z1 = 1, y1 = 2, q1 = 0, o1 = 1.
z2 = 2w1  1, y2 = w1  w+1, o2 = w1  w+1.
z3 = w2, y3 = w2+w  w2w, q3 = w2w  w2+w, o3 = w3w
When n is even, build zn in two ways. Let the first n2 entries have squares that sum to 0, and then bring in the last two entries. This gives zn2z2. Or the first n2 entries could produce something nonzero, and the last two must take up the slack.
zn = zn2z2 + (w1)yn2y2
Similar reasoning holds for yn.
yn = zn2y2 + yn2z2 + (w2)yn2y2
Even though z2 and y2 depend on the value of w mod 4, z4 and y4 do not.
z4 = w3+w2w, y4 = w3w, o4 = w2(w1)2(w+1)2.
Verify that our formula for y4 comes out the same for a nonsquare length. Like y2, y4 is the same across the nonzero lengths. In fact, as you step through n = 6, 8, 10, and so on, qn always equals yn. The distribution of vectors is even across the nonzero lengths, though it may still depend on w mod 4.
When n is odd we need some different formulas. Instead of separating the first two entries, make the first entry 0 or nonzero and derive the following.
zn = zn1 + (w1)yn1
yn = (w2)yn1 + 2zn1
qn = wyn1
z5 = w4, y5 = w4+w2, q5 = w4w2, o5 = w4(w1)2(w+1)2(w2+1).
If you move on to dimension 6, use the formulas based on z4, y4, z2, and y2. In 6, as with 2, the formula for o depends on w mod 4.
For dimension 7, use the "odd" formulas, as we did with 5. Continue this process as far as you like.
z6 = w5+w3w2  w5w3+w2, y6 = w5w2  w5+w2, o6 = y6o5
z7 = w6, y7 = w6+w3  w6w3, q7 = w6w3  w6+w3, o7 = y7o6
z8 = w7+w4w3, y8 = w7w3, o8 = y8o7
When w is even, i.e. a power of 2, zn = yn = wn1, and on = wn×(n1)/2. This is left as an exercise.
Like a finite field, O2 has a primitive matrix U that generates all the others.
U, u2, u3, u4, u5, … 1
The top row of a matrix in O2 is a unit vector [a,b], such that a2+b2 = 1. The bottom row is [b,a], but that just goes along for the ride, and carries no new information.
Multiply two matrices, with top rows [a,b] and [c,d], and get a matrix with top row [acbd,ad+bc]. Does this look familiar? It is the formula for multiplication in the complex plane. If i is the square root of 1, then a+bi * c+di = (acbd) + (ad+bc)i. It's exactly the same formula.
Assume F has no square root of 1. Thus F[i] is a true field extension, giving a new field E, whose members can be written a+bi. E has a primitive root z, with log 1. Let y = zw1, where w is the order of F. Thus log(y) = w1. The powers of y have logs that are multiples of w1, all the way up to (w1) times (w+1), which is w21, which is the same as a log of 0, giving 1 in E. Write y as a+bi, and y seeds a cycle of matrices, having a period of length w+1. We need to show that y, and thus the powers of y, are precisely those elements with a2+b2 = 1. This is the formula for the norm of E over F. When does an element have norm 1? We showed earlier that the norm of z is zw+1. The elements with norm 1 are the w+1 roots of 1, which are the powers of y. The matrix associated with y is primitive for O2.
If F already has a square root of 1, then E = F[i] is isomorphic to two copies of F running in parallel, by the chinese remainder theorem. If (a+bi) times (abi) = 1, then apply the two homomorphisms, and the element of F on the left, times the element of F on the right, equals 1. The right is always the inverse of the left, so focus on the left, a+bs, where s is the square root of 1 in F. Multiplication upstairs becomes multiplication downstairs, and F (downstairs) has some primitive root y. The preimage of y, a+bs, written as a+bi, defines the matrix that generates O2.
Finally let F have characteristic 2. This time the size of O2 is w, or 2n. Square a+bi and get 1. That's right, everything is a square root of 1; everything is an involution. And multiplication commutes. Therefore O2 is the direct product of n copies of Z/2. Yeah, it's weird, but it doesn't come up very often.
Is there an efficient procedure to find unit vectors, i.e. members of O2? There is, and it works mod n, even if n is a thousand digits long and its factorization is not known.
Choose random values of c, just above the square root of n for efficiency, then ask whether c2 mod n is a prime that is 1 mod 4. As soon as you find one, solve for a and b such that a2 + b2 is equal to this prime. That gives a2 + b2 = c2 mod n. Divide through by c2 and relabel the variables so that a2 + b2 = 1. Given a few such unit vectors, at least one is likely to be primitive, i.e. a generator of O2.
Here is a brief diversion into continuous math. What does O2 look like over the reals? The orthonormal vectors lie on the unit circle, which I will place in the complex plane. As shown above, multiplying two matrices in O2 is the same as multiplying their toprow vectors in the complex plane. By Demoivre's Formula, this adds their angles. Thus O2 is isomorphic to θ from 0 to 2π, with matrix multiplication corresponding to angle addition. As a topological space, O2 is a circle.
What about SL2(R), that is, the 2×2 matrices with determinant 1? This includes O2, so intuition suggests it is based on a circle.
The geometric way to think about it is to start with 2 vectors u and v in the plane having determinant 1. Spin them around in the plane so that u runs along the positive x axis. This is the parameter θ that defines the circle. Now u is [a,0] and v has to be [?,1/a] to give a determinant of 1. Thus v is in the upper half plane. Its y coordinate is 1/a and its x coordinate is undetermined.
Add a multiple of u to v so that it points straight up the y axis. This is any real number, in other words, a line. So far we have a circle and a line.
This translation, adding a multiple of u to v, does not change the determinant of u and v; the determinant is still 1.
Finally, scale u by 1/a and v by a to get the unit vectors [1,0] and [0,1]. The parameter a is any positive real number, which is a positive ray. Since this ray is open at either end, it is topologically equivalent to a line. Therefore SL2 is homeomorphic to a circle cross a line cross a line, or a circle cross a plane. If you're a tiny fly, it looks like 3space. You can jiggle 3 of the 4 entries in the matrix any way you like, and the fourth entry moves in sync to keep the determinant 1. You have 3 dimensions of freedom. It looks like 3space locally, hence it is a 3manifold, but if you travel far enough you can come back where you started, tracing the circle of θ. This is not the same as xyz 3space.
Since orthonormal matrices preserve lengths, they map unit vectors to unit vectors. Multiply u by a matrix M and get another unit vector v. Since M is invertible, the action is a permutation. The product of two matrices induces the first permutation followed by the second, which is the composition of the two permutations. This is an instance of a general topic, groups acting on sets, which is coming soon.
Consider the action of O2. Let M be a matrix in O2, hence the top row of M is a unit vector, and the bottom row goes along for the ride. The action of M multiplies the top row by the unit vectors, as though they all lived in the complex plane.
Let U generate all of O2. Now U shifts all the unit vectors around in a cycle, just as U shifts all the matrices in O2 up by 1. In fact the top row of these matrices are the unit vectors. This is a cycle on an even number of elements, hence it is an odd permutation. O2 induces both even and odd permutations on unit vectors, and the even permutations are induced by the squares in O2, i.e. the even powers of U. We will see that the same thing happens in O3, but it's more complicated.
[ A circle of 12 dots, with arrows pushing them around counterclockwise, and U above one of the arrows. ]
Let M be the 3 by 3 matrix with U (as above) in the upper left and 1 in the lower right. Thus the order of M is w1 or w+1, as driven by U in O2. M is in O3, and when M acts on the vector [x,y,z], it always maps z to z. If x and y are 0 then M fixes z. (This makes z 1 or 1.) What other vectors are fixed by M?
Assume either x or y is nonzero. If the top row of U is [a,b], then the action of U on [x,y] is [axby,ay+bx].
axby = x
ay+bx = y
x = by/(a1) { a is not 1 in U }
ay+b(by/(a1)) = y { substitute for x in the second equation }
ay+(1a2)y/(a1) = y { a2 + b2 = 1 }
ay(1+a)y = y
y = y
y = 0 { assume not characteristic 2 }
ax = x { y = 0 in the first equation }
x = 0 { a is not 1 }
This isn't specific to M; any power of M, that is not the identity matrix (with a = 1), fixes only [0,0,±1].
Imagine a unit vector s that is mapped to the same vector by two different powers of M representing two different matrices in O3. If s is a vector with x or y nonzero, and sMi = sMj, then Mij fixes s. That can't happen, hence the powers of M map s to distinct vectors. Seen another way, s, sM, (sM)M, (sM2)M, etc are distinct, until we get back to s. M pushes these vectors around in a cycle of length w1  w+1. Each of these cycles is an odd permutation.
How many such cycles are there? There are w2+w  w2w unit vectors, and two of them are fixed by M. Subtract 2 and divide by w1  w+1, giving w+2  w2. There are an odd number of these cycles, and the permutation is odd. Therefore O3 induces both odd and even permutations.
Let S be the set of matrices that induce Even permutations, and let T be the set that induce odd permutations. There is a bijection between these two sets; multiply by M, or by M inverse, to go back and forth. Half of O3 is even and half is odd, as you would expect.
All the squares, i.e. a matrix times itself, belong to S. In fact S consists precisely of squares, but we'll need a counting argument to prove that.
O3  


S  T 
Review the monomorphism from the projective quaternions with norm 1 into O3. How many quaternions have norm 1? These are vectors of length 4 with norm 1, and the answer is w3w. Since each quaternion is equivalent to its opposite, divide this by 2. And how big is O3? The answer is once again w3w. Exactly half of these are images of quaternions with norm 1. At the same time, each one of these has a square root in O3. Half of O3 are squares, and these induce even permutations, while the nonsquares induce odd permutations. The squares fill up all of S, leaving the nonsquares for T.
When the base is a finite field, the quaternions are the same as the 2 by 2 matrices. In other words, they are isomorphic. One is merely a relabeling of the other. This is not the case over the reals, or any other ordered field. Recall that the quaternions over the reals form a division ring, wherein everything nonzero is invertible. Yet there are plenty of 2 by 2 matrices that are not 0, and not invertible. Put anything you like in the top row, and [0,0] in the bottom.
Characteristic 2 fails as well. The quaternions commute, but the matrices do not.
We want a ring homomorphism from the quaternions into the 2 by 2 matrices, hence 0 maps to 0 and 1 maps to the identity matrix. Select images for i j and k, and the entire map is determined. Of course i j and k must respect i2 = 1, ij = k, and so on. Here is my map, and I'm sure it's not the only one out there.
1 → 

i → 

j → 

k → 

Assume 2x2  2vx + v2 + 1 = 0, so that j2 = 1.
Verify that i2 = 1, j2 = 1, ij = k, and ji = k. It turns out this is all you have to do.
k2 = (ij)(ij) = ijij = iijj = 1
jk = jij = jji = i
kj = ijj = i
ik = iij = j
ki = iji = jii = j
Given v, can we find x? The discriminant of the quadratic is 4v28. We can solve this whenever v22 is a square. Remember that v2 covers half the nonzero values in our finite field, along with 0. Negate this set and subtract 2 to find a new set of the same size. This has to cover at least one square. There simply aren't enough nonsquares to accommodate a set of this size. Therefore there is a v, and an x, that brings this homomorphism to life.
The map can be derived mod n, even if n is a thousand digits long and its factorization is not known. Select values of c at random, just below the square root of n for efficiency, until nc2 is a prime that is 1 mod 8. Find a and b such that a2+2b2 is this prime. Thus a2 + 2b2 = c2. Divide through by b2 and relabel the variables to get a2  2 = c2. Let a play the role of v in the above, and bring the homomorphism to life.
To reverse this map, to go from matrices back to quaternions, you need to invert the following matrix.
1  0  0  1 
1  1  2  1 
v  x  2x2v  v 
v2x  vx  2x  2xv 
The determinant is 4, which is a unit. The inverse is the following matrix divided by 2. This proves the map is an isomorphism. The quaternions and the 2 by 2 matrices are really the same thing.
1  1  v  2xv 
0  2  2v2x  2x 
0  1  x  xv 
1  1  v  v2x 
Amazingly, the norm of a quaternion is the determinant of its image. Start with a quaternion a+bi+cj+dk and turn it into the following matrix.
a  b + cv + dv  2dx  b + cx + dv  dx 
2b + 2cx  2cv + 2dx  a + b  cv + 2dx  dv 
Then use this program to find the determinant, which comes out a2 + b2 + c2 + d2.
The quaternions with norm 1 become the matrices with determinant 1. When the base field is finite, we already calculated the size of both groups; they agree at w3w.
As shown above, there is a solution to x2  2xv + v2 + 1 = 0 mod p. In particular, there is a discriminant z such that z2 + v2 + 2 = 0. Both variables cannot be 0 simultaneously. Without loss of generality, assume z is nonzero. Extend this equation to p2, where v takes on any value mod p2 that is compatible with v mod p. z is still correct nod p, but may need to be adjusted to be coorect mod p2. For instance, add 3p to z, and you add 6zp to z2. Thus the equation is satisfied mod p2, and the quaternions and the 2 by 2 matrices are isomorphic mod p2.
Continue this process up to p3, p4, and so on up to pk. Thus the quaternions and the 2 by 2 matrices mod pk are isomorphic.
Do this for several prime powers in parallel, and find an isomorphism mod n by the chinese remainder theorem.
Push the relationship up forever, through each level pk, until v, z, and x are infinite series of values mod pk. This establishes the isomorphism over the padic numbers. In fact, using these values of x and v, the isomorphism holds for the fraction field.
Finally consider the complex numbers. You can take the square root of anything, so v leads to z, leads to x, leads to an isomorphism.
The algebra that equates norm and determinant is still valid, hence the norm of the quaternion equals the determinant of the corresponding matrix, over any field.
When the underlying field has a topology, like the padic numbers or the complex numbers C, then the quaternions over C also have a topology, implied by the direct product of 4 copies of C. The 2 by 2 matrices over C have the same topology. The map in either direction is linear and continuous. A bicontinuous isomorphism is a homeomorphism; the topological rings are the same.
Return to a finite field, and the quaternions with norm 1 are isomorphic (under multiplication) to the 2 by 2 matrices with determinant 1. Equate each quaternion with its opposite, and each matrix with its opposite. These two groups are now isomorphic to the "even" orthonormal matrices of O3. A theorem in one world carries over to the other two.
Matrices det 1 / ±1 
⇔  Quaternions norm 1 / ±1 
⇔  O3 even 
I will close this chapter on finite fields with a very interesting structure, the algebraic closure of the integers mod p.
Call this F, and suppose F is finite. Let b be primitive in F. Since b has no square root, F is not closed. (If p = 2, take some other root of b.) Therefore F is infinite.
In fact F is countably infinite, since the polynomials mod p are countable.
For each integer k, there is but one subfield of F of dimension k. If there were more then there would be too many roots for the polynomial x to the pk1 = 1.
The finite subfields of F form a lattice that corresponds to the integers under factorization. The field of dimension 2, and the field of dimension 3, have to live in the field of dimension 6, or a multiple of 6, hence the lcm of the dimensions gives the smallest field containing a set of finite fields. Similarly, the gcd of the dimensions gives the intersection of a set of fields.
Bring in these finite fields one at a time, building an ascending chain of fields. Each has a dimension that is the lcm of the first n integers. Let E be the union of this ascending chain of fields. Clearly E belongs to F. Suppose E is not F. Some u, not in E, is algebraic over E. This makes u algebraic over Z/p. If the dimension of u is n, then u is folded into E at step n or earlier. Therefore E = F.
You can build a proper infinite subfield of F; bring in all the finite fields of even dimension.
Without 0, each subfield is generated by a primitive root. This implies a multiplicative cycle, and these cycles get larger and larger, each containing the previous, as you climb up any chain of subfields in F.
Suppose F* is generated by a finite set of elements. These elements live in a finite field, and cannot generate all of F.