Given a ring R, the n×n matrices over R can be added, scaled, and multiplied. If M is such a matrix, we can talk about M + 3, 7M - 11, or M3 - 5M2 + M + 21. In other words, a matrix can be substituted into a polynomial, just like a scalar from R.

The coefficients of the polynomial usually come from R; this will be the default assumption. If you want everything to be a matrix, change a coefficient, like 5, into 5 times the identity matrix. Now 5 can be multiplied by M2 using matrix multiplication. It's really the same whether you turn 5 into a matrix or leave it as a scalar, as long as you know how to apply the action of 5 to M2.

This is an example of nesting structures within structures, as described in chapter 1. In other situations I may refer to a matrix of polynomials, wherein each entry of a matrix is a polynomial in one or more variables. Such matrices can be added and multiplied by performing ring operations on the underlying polynomials. In general, the entries of a matrix could be functions on real or complex variables. A matrix of functions is, equivalently, a function of matrices. The matrix associated with x = 3 has, as entries, fi,j(3), for each function in the matrix. A matrix can also contain sequences, and again, this is equivalent to a sequence of matrices. I'll get to these later. For now, let's return to p(M), where p is a polynomial and M is a matrix.

Let M be a square n×n matrix over a ring R. The order of M is j if j is the smallest positive integer such that Mj = 1. (I'm using 1 to represent the identity matrix.) If there is no such j, then the order of M is 0. This is consistent with the definition of order within a group, in this case the group of nonsingular matrices under multiplication.

If Mj = 1 then Mj-1 is the inverse of M, and M is nonsingular.

Let Y be similar to M. In other words, Y = QM/Q, where Q is an invertible matrix. Verify that Yj = QMj/Q. At the same time, Mj = (1/Q)YjQ. Thus Mj = 1 iff Yj = 1. Similar matrices have the same order.

If M is a large matrix, it is often worth while looking for a nonsingular matrix Q that diagonalizes M. In other words, M = QD/Q, where D is diagonal. Now Mj = QDj/Q, and Dj is easily calculated, even if j is a real number. Thus M3.76 is well defined.

If M is similar to D, as described above, then the order of M is j iff all the entries in the diagonal matrix D are roots of 1, and the least common multiple of these exponents yields j.

Let M have an eigen value of e. Apply M twice and find an eigen value of e2. If e is not a root of 1, then the powers of M can never produce the identity matrix. Conversely, if M has a positive order than all its eigen values are roots of 1. This is affirmed by the diagonalization argument given above. The elements of D are the eigen values, and they are all roots of 1.

If R is the complex numbers, and the trace of M exceeds n, then at least one eigen value exceeds 1, and M has order 0. This because trace is the sum of the eigen values. The same holds if the absolute value of the trace of M exceeds n.

Let M be block diagonal. Raising M to the j raises each block to the j. Thus M has order j iff each block has an order dividing j, and the lcm of these orders yields j. This confirms the earlier results when M was diagonalizable. Each entry is its own block, and a root of 1, and together these elements must be raised to the jth power to reach 1.

For the rest of this chapter, R is an integral domain. Therefore M is similar to a matrix Y in jordan canonical form. Concentrate on a jordan block that is not pure diagonal. It therefore has l down the diagonal, and ones on the subdiagonal. If l is zero then this block eventually becomes zero, and M has order 0, so assume l is nonzero. Write the block as the sum of the diagonal matrix and the subdiagonal matrix. Raise this to the jth power by applying the binomial theorem. (The binomial theorem is valid for any commutative ring, and since the diagonal and subdiagonal pieces commute, we're ok.) The diagonal matrix is raised to the j, then the next term is the diagonal raised to the j-1, times the subdiagonal, times j. Since R is an integral domain this is always nonzero, unless the characteristic of R happens to be j, and I'll address that in the next chapter. The remaining terms bring in entries below the subdiagonal. The result is a block that always has something on the subdiagonal, and is never 1.

In conclusion, M has a positive order iff it is diagonalizable, and its eigen values are all roots of 1. The order of M is the lcm of these exponents.

If T is an n×n matrix, picture T, i.e. the entries of T, as a vector in n2 space. Similarly, T2 looks like a vector in n2 space, and so do all the higher powers of T. The first n2+1 powers of T, from T0 = 1 up to T to the n2, define n2+1 vectors in n2 space, and are linearly dependent. A linear combination of these vectors yields 0. In other words, T is the root of a polynomial with degree n2 or less.

Of course the coefficients of this polynomial could be fractions of R. We could fix this by multiplying through by a common denominator, but instead, let's just extend R to include its fractions. Now we can divide through by the lead coefficient, whence the polynomial becomes monic.

The minimum polynomial of T is monic, and has minimum degree. If there are two such polynomials, subtract them to find a polynomial of lesser degree with root T. Thus the minimum polynomial is well defined, for any T. If T is 0 let its polynomial be 0. Otherwise the minimum polynomial is nonconstant.

Let m(x) be the minimum polynomial, and let p(T) = 0. Use synthetic division to divide m(x) into p(x). If there is a remainder r(x), then r(T) = 0, and r(x) has degree less than that of m(x), which is impossible. Therefore m(x) is a factor of p(x). A polynomial p has root T iff p is a multiple of m.

The ring of polynomials over a field R is a pid, and the polynomials with root T form a principal ideal, generated by the minimum polynomial m(x).

Let Q be a nonsingular matrix that implements a base change. In other words, Y = QT/Q. We already showed Yn is the base change of Tn. Verify that base change commutes with addition as well. Put this all together and p(Y) is the base change of p(T), just as p(T) is the inverse base change of p(Y). One is 0 iff the other is 0. Therefore similar matrices are roots of the same polynomials, and exhibit the same minimum polynomial.

By schur's theorem, convert T to a lower triangular matrix, with its eigen values down the main diagonal. The diagonal of p(T) is now p applied to the diagonal elements. These must be 0 if p(T) = 0. Therefore the eigen values of T are roots of p, whenever p(T) = 0.

Take another step and convert T to Y, in jordan canonical form. Now p(Y) is p applied to each block. In particular, p(Y) is 0 iff p is 0 on each block. The polynomials that have a given block B as root are all multiples of the minimum polynomial for B, thus defining a principal ideal of polynomials. Intersect these principal ideals to find the ideal of polynomials satisfying Y. This ideal is generated by the minimum polynomial for Y, which is the lcm of the minimum polynomials for each block.

The Cayley Hamilton theorem, named after its co-discoverers, states that any given matrix T is a root of its characteristic polynomial. Stretching the definition just a bit, T is its own eigen value.

This does not mean the characteristic polynomial of T is the minimum polynomial. When T is the identity matrix, its characteristic polynomial is (x-1)n, while its minimum polynomial is x-1. Of course the minimum polynomial divides the characteristic polynomial, as you would expect when T is a root of both.

Build E by placing the variable x down the main diagonal and subtracting T. As you recall, the determinant of E gives the characteristic polynomial for T.

Let F be the adjoint of E. Thus E*F gives the determinant, which I will call p(x). More accurately, E*F gives a diagonal matrix, wherein each diagonal entry is p(x).

Write F as F0 + F1 + F2 + … Fn, where Fi retains only the coefficients on xi. Thus F0 is a matrix based on the constants of F, Fn-1 is the identity matrix, and Fn is 0, since no cofactors produce xn.

Now E*F equals (x-T) times the sum over Fixi, which is in turn equal to diagonal p(x). If ci is the ith coefficient on p, write ci = Fi-1 - TFi.

Premultiply the ith equation by Ti. The first few equations look like this.

c0 = - TF0
Tc1 = TF0 - T2F1
T2c2 = T2F1 - T3F2
T3c3 = T3F2 - T4F3

Tncn = TnFn-1 - Tn+1Fn

Add these equations together and the right side telescopes to -Tn+1Fn, which is 0, since Fn is 0. The left side becomes p(T). Therefore p(T) = 0, and T is a root of its characteristic polynomial.

If you didn't like that proof, here is another one, based on jordan forms. This assumes R is an integral domain, so is a tad less general.

Similar matrices satisfy the same polynomials, and they also have the same characteristic polynomial. If T is a root of its own characteristic polynomial, the same holds for every matrix similar to T. This includes Y, the jordan canonical form of T.

Let m(x) be the minimum polynomial of T, or of Y. This is the lcm of the minimum polynomials of the jordan blocks, so look at each block in turn.

Let a block be diagonal, with eigen value l. Its minimum polynomial is x-l, which is a factor of its characteristic polynomial (x-l)b, where b is the size of the block.

Next consider a jordan block with l down the diagonal and ones on the subdiagonal. Once again the characteristic polynomial is (x-l)b. Replace x with the jordan block and evaluate. The expression x-l eaves only the subdiagonal, and when this is raised to the b power, the result is 0.

Each block satisfies its characteristic polynomial, and the product of these polynomials gives the characteristic polynomial for Y, which is satisfied by all the blocks of Y, and by Y itself.

Let T be a matrix with minimum polynomial m(T), having degree d. Move Td to one side, and Td is equal to a polynomial q(T), having degree less than d.

Given a higher power of T, repeatedly substitute Td = q(T), until you have a polynomial of degree less than d. Thus mathematics is performed mod m(T). The result is a commutative ring.

Can we find T inverse in this ring? Remember that m divides the characteristic polynomial of T. Both these polynomials have a nonzero constant term iff all eigen values are nonzero, iff T is nonsingular.

Assume T is nonsingular, whence q(T) has a constant term. Look for a polynomial u(T), with degree less than d, such that u*T is a constant. Start by setting u = Td-1. Now u*T produces Td, which is replaced with q(T). Adjust u so that u, when shifted over by T, cancels everything in q except for the constant. Now a scaled version of u acts as the inverse of T. Square this to represent 1/T2, and so on.

Other polynomials in T may not be invertible; the ring might not be a field.

We have substituted matrices into polynomials; let's turn things around and place polynomials inside matrices. Thus each entry in an n×n matrix is a polynomial in x, with coefficients in a base field R. Replace x with an element of R to obtain a traditional n×n matrix. Note that substitution commutes with matrix operations. You can multiply two matrices, then substitute for x, or substitute for x first and then multiply the resulting matrices. The result is the same.

Since nonzero constants are units in the ring of polynomials over a field, a matrix of polynomials is invertible iff its determinant is a nonzero constant.

Generalizing the above, a matrix can contain arbitrary functions of x. Again, substitution for x commutes with matrix operations.

If the base field is real or complex, or some other metric space, we can talk about the limit of a matrix as x approaches c. Using the product topology, the matrix approaches a limit iff all its component functions approach a limit. The limit of the matrix is the limit of these functions, entry by entry, as x approaches c. In this sense the matrix is just a collection of functions, without any particular structure. A 5 by 6 matrix is not particularly different from a 10 by 3 matrix.

When the difference quotient (f(x+h)-f(x))/h is applied to a matrix of real or complex functions, the result is the difference quotient applied per entry. Thus a matrix is differentiated entry by entry, and similarly for integration.

Verify that differentiation and integration commute with matrix addition, and with scaling by elements of R. Thus differentiation and integration are linear operators on a vector space of matrices, of a fixed size, whose entries are functions drawn from a given set, wherein functions are differentiable, or integrable, and closed under addition and scaling by R. That's a lot of words to say something fairly intuitive. Add two matrices and you add their derivatives.

Put the structure back into matrices, so they can be multiplied together. Surprisingly, the product rule for differentiation still applies. Look at (FG)i,j, and take its derivative. This is the derivative of (the ith row of F dotted with the jth column of G). Apply the product rule term by term. Then spread this across the entire matrix, and the derivative of FG is F′*G + F*G′.

Don't assume that the derivative of F2 is 2FF′. After all, F and F′ may not commute. Build a 2x2 matrix with zeros on the diagonal, x in the upper right, and 1 in the lower left. Note that F′F has 1 in the upper left corner, while FF′ has 1 in the lower right.

If F is symmetric, then F and F′ commute, and the derivative of Fn is once again nFn-1F′.

If F is a differentiable matrix at the point x, and G is the inverse of F at x, then use the adjoint to write G as sums of products of entries in F, over the determinant of F. Since the determinant is nonzero, and differentiable at x, each entry in G is a differentiable function at x, and G is differentiable at x. Write FG = 1 and differentiate using the product rule, giving FG′ = -F′G, or G′ = -GF′G.

If F is symmetric, then G, defined by the adjoint of F, is also symmetric. With F′ symmetric, G and F′ commute. This gives the familiar formula G′ = -F′G2 = -F′/F2.

Instead of the real or complex numbers, let the functions of a matrix operate on the nonnegative integers. In other words, M is a matrix of sequences. Evaluate at t = 0, 1, 2, 3, etc, and find a sequence of matrices. The characterizations are equivalent, just as a matrix of functions is equivalent to a function that produces a matrix for every x.

Once again, a sequence of matrices converges to a limit iff the individual sequences converge. This holds for arbitrary metric spaces, as well as the real and complex numbers. Here again the matrix has no structure; it is just a collection of sequences.

Looking at the sequence of partial sums, a series of matrices has a well defined limit iff its component series converge.

Sequences and series can be added and scaled, and the sum of convergent sequences remains convergent. Thus sequences of matrices form a vector space, and similarly for convergent sequences and series.

If the absolute value of a matrix is the absolute value of its entries, then a series of matrices is absolutely convergent iff its component series are all absolutely convergent. A matrix B bounds the sum of matrices in absolute value, iff Bi,j bounds the absolute value of the series for i,j.

Put structure back into the matrices, and take a moment to verify that, for a constant matrix C, multiplying a sequence of matrices by C on the left multiplies its limit by C on the left. (A similar result holds for multiplication on the right.) Look at the entry in row i column j. This is the ith row of C dotted with the jth colum of the nth matrix in the sequence. The individual sequences in the jth column converge, and so does a linear combination of these sequences, as prescribed by the ith row of C. Thus the sequence defined by row i column j converges, and its limit is in row i column j of C×L, where L is the limit of the original sequence. Apply this to all entries, and that completes the proof.

Apply this to partial sums, and multiplying a series by C multiplies the sum by C.

Use similar reasoning when M, a matrix of functions, is continuous at x, so that multiplying M by C multiplies its limit by C. Apply this to the difference quotient, and the derivative of CM is C times M′. Or you could just use the product rule. Since C′ is 0, you are left with CM′(x).

We can multiply a sequence of matrices on the left by a constant matrix, or the right, so why not do both? If P is an invertible matrix, multiply a matrix of sequences, or series, or continuous functions, on the left by P, and on the right by P inverse, and the limit matrix is conjugated by P. Base change respects the limit of a sequence or series of matrices.

Let's apply this to absolute convergence. The real matrix B majorizes the complex matrix A if entries of B are at least the absolute values of the corresponding entries of A. If B majorizes A, and D majorizes C, verify that B+D majorizes A+C, and BD majorizes AC. By induction, Bn majorizes An. If Bn approaches 0 than An also approaches 0.

Let each matrix in a sequence A be majorized by B. The limit of the sequence, if it exists, is majorized by B. Multiply the sequence by C on the left, and let D be the absolute value of C, so that D majorizes C. DB majorizes CAi, and also the limit of CAi.

Let M be an absolutely convergent series of matrices, and let Wi be the absolute value of Mi. Let Ai be the partial sum of W1 through Wi. By the definition of absolute convergence, some matrix B bounds the sequence A. Multiply M by C on the left. If D is the absolute value of C, then CMi is majorized by DWi. DAi majorizes the partial sum of CM1 through CMi. This in turn is bounded by DB. C times an absolutely convergent series remains absolutely convergent.

If P is invertible, then the base change of an absolutely convergent series of matrices, multiplying by P on the left and P inverse on the right, is absolutely convergent.

The rest of this chapter deals with real and complex variables, which is slightly off topic for a book on discrete math. The material is not used elsewhere, so you can skip to the next chapter if you wish. However, this is the beginning of ordinary differential equations, so if you have seen some calculus before, you might enjoy it.

The spectral radius of a matrix is the radius of the smallest circle in the complex plane, centered at 0, that contains all its eigen values. Every characteristic polynomial has at least one root, hence every matrix has at least one eigen value. At the same time, a matrix has at most n eigen values, hence the spectral radius is well defined.

Squaring M squares its eigen values. Hence the radius of M2 is the square of its radius, and so on for higher powers.

What can we say about the spectral radius if Mn approaches 0. In earlier sections we have seen the utility of switching to a similar matrix. If P is invertible then the limit of PMi/P, as a sequence or series, is PL/P, where L is the original limit matrix. If a sequence approaches 0, all similar sequences approach 0. Switch M to jordan form Y, to see if Mn approaches 0. The diagonal elements of Yn are eigen values raised to the nth power, so the spectral radius must be less than 1. Conversely, assume the spectral radius is less than 1, and use the binomial theorem to see that the subdiagonal elements of a simple jordan block become n times the eigen value raised to the n-1. The next diagonal is (n:2) times the eigen value raised to the n-2, and so on. A power of something less than 1, times a polynomial in n, approaches 0. Everything below the main diagonal approaches 0. Therefore Mn approaches 0 iff its spectral radius is less than 1.

In the above, the march to 0 is geometric. Y can be majorized by a real matrix B, also in jordan form, wherein the diagonal of B holds the absolute values of the diagonal of Y. Let the largest of these b k, where k < 1, and the diagonal of Yn is bounded by kn. The subdiagonal is bounded by nkn-1, hence geometric times n. Lower stripes are geometric times a power of n. The sum of Yn is majorized by Bn, which is absolutely convergent. Thus Yn is a convergent series, and so is Mn.

The sum of the first n terms, i.e. the nth partial sum, equals (1-Mn)/(1-M). This approaches a limit matrix, as described above. With Mn approaching 0, the series approaches 1/(1-M).

If M has spectral radius less than 1, 1-M is invertible.

Let f be an analytic function with a radius of convergence of r. Let M be a matrix with spectral radius less than r. Expand f into its power series and evaluate it at M. Does it converge?

Let a nonsingular matrix P convert M to jordan canonical form. Apply P to the power series that defines f(M). The original series converges iff the new series converges. If the original sum was l, the new sum is Pl/P.

Concentrate on a simple jordan block with eigen value z. Remember that z is inside the circle of convergence. The main diagonal becomes f(z), which is convergent. If there are ones below the main diagonal, apply the binomial theorem. The subdiagonal of the nth term is annzn-1. Add these up and the subdiagonal becomes f′(z). Since f is analytic, the derivative exists, hence the subdiagonal converges. The next diagonal converges to f′′(z)/2!, and so on. Put this all together and f(M) converges.

As a special case of the above, let f be the exponential function. The exponential of a matrix M, written exp(M), or eM, is computed using the power series for ez. The circle of convergence is the entire plane, hence eM converges for every matrix M.

Let Y be the jordan form of M, so that eY and eM approach their corresponding limits in lock step. Y can be majorized by B, as described above. Look at a jordan block B. If the diagonal is l, then the series eB approaches el on the diagonal. This series is absolutely convergent, at least on its diagonal, and similarly for eY, where the eigen value of Y, on the diagonal, has absolute value l. The subdiagonal of eB is also absolutely convergent, having terms nln-1/n!, and the lower stripes are absolutely convergent as well. Carry this result back to Y, and eY is an absolutely convergent series. The base change of an absolutely convergent series of matrices is absolutely convergent, hence eM is absolutely convergent, for any matrix M. There is a lot of algebra behind this proof, but it is pretty intuitive. The denominators of the power series for eM are n factorial, and that's going to grow much faster than Mn, no matter the matrix M. The same thing happens in the reals and the complex numbers.

It doesn't happen often, but assume the matrices A and B commute, and expand eA+B into its power series. On the other side, expand eA and eB, and multiply these series together. This is an infinite series times an infinite series, which is the sum of everything in the first series times everything in the second. This converges, regardless of the order of the terms, because eA and eB both converge absolutely.

How do we make the terms correspond? Consider the nth diagonal on the right. This includes An/n! times 1, and 1 times Bn/n!, and everything in between. Multiply this by n!, for convenience. The result is the sum of (n:i)AiBn-i, which is simply (A+B)n. Divide by n! and get (A+B)n/n!. This is the nth term of eA+B. The terms correspond, and eA+B = eAeB.

As a special case of the above, let M be a matrix and let s and t be scalars, and consider esM+tM. Since sM and tM commute, the result is esMetM.

Let s = 1 and t = -1, whence e0 = eMe-M. Therefore eM is invertible, with inverse e-M.

Let Y be a matrix in jordan form and let t be a real or complex variable. What is the derivative of eYt?

Let B be a jordan block of Y with eigen value l. If B is diagonal then eBt is diagonal, and equal to elt. The derivative is leBt, or if you prefer, BeBt.

Let B have ones on the subdiagonal, thus Bt has t on the subdiagonal. We have applied analytic functions to jordan blocks before, using the binomial theorem. The diagonal is once again elt. The subdiagonal has terms tn(lt)n-1/n!, which is the derivative of e, applied to lt, times t, which is telt. The kth stripe down is the kth derivative of e, applied to lt, and multiplied by tk/k!. this is tkelt/k!. Take the derivative using the product rule. The result is elt multiplied by (ltk + ktk-1)/k!.

Instead of taking the derivative, multiply by B on the left or the right. This multiplies each stripe by l and adds the stripe from above. This gives the same formula as above. Therefore the derivative of eBt = BeBt, and the derivative of eYt = YeYt.

Let Y be the jordan form of M. That is, M = PY/P. Conjugation respects addition and multiplication, so eM, expanded into its power series, is a power series in PY/P, which becomes PeY/P. Derivative and base change commute, so the derivative of eM is P times the derivative of eY times P inverse. This is PYeY/P, or PY/P times PeY/P, or MeM. This is the familiar formula from calculus. Take the coefficient on t, in this case the matrix M, and multiply by M on the left or the right.

Consider the differential equation y′(t) = y(t)M, where M is a fixed matrix, and y(t) is a matrix function of t. This looks like one differential equation in one variable, when written in matrix notation, but there are really many functions of t, one for each entry in the matrix y, and many equations that relate these functions to their derivatives. So this is really a system of linear differential equations. Whatever you call it, let's look for solutions.

The function y(t) = eMt is a solution, as shown above. Furthermore, if c is a constant matrix, a constant function of t, ceMt is a solution. Set c to the identity matrix for eMt, and set c = 0 for 0, which is also a solution.

If matrices are n×n, then c provides n2 linearly independent solutions. Put 1 in each of the entries of c, with the rest 0, to produce a basis of solutions. Suppose a linear combination of these solutions yields the zero function of t. This is a nonzero matrix c, times eMt, that is 0. For every t, eMt is invertible. Multiply by e-Mt and get the constant function c. Therefore ceMt cannot be 0.

To complete the characterization, we need to show there are no other solutions. Suppose there is some other solution z(t). Remember that z, like y, is an n×n matrix of functions. Since eMt is everywhere nonzero, divide z by eMt and call the quotient q. In other words, the solution is q(t)eMt. Use the product rule to differentiate.

q′eMt + qeMtM = qeMtM

q′eMt = 0

Q′ = 0  ( multiplying both sides by e-Mt )

q = c  ( for some constant matrix c )

All solutions have been identified, and the solutions form a vector space of dimension n2, as dictated by the constant matrix c.

The analogous differential equation y′ = My has solutions eMtc, where the constant matrix c is multiplied on the right.

If M is a matrix of constants, the derivative of Mt, with respect to t, is M. Thus the derivative of eMt is eMt times the derivative of the exponent. This begins to look like the chain rule from calculus. Let's try to generalize this to arbitrary functions in the exponent.

Let U(t) be a matrix of functions in t, where each function is analytic. Consider the derivative of eU(t). Expand eU into its power series. Since functions are analytic, the series can be differentiated term by term. (This simple sentence entails several chapters of analysis, which are not part of this book.) As described earlier, the derivative of Un is nUn-1, even if U is a matrix of functions. Pull U′ out of the resulting series, and get U′eU, which is what you would expect from calculus.

Now return to our differential equation, but this time consider y′ = yV, where V is a function of t. A proof similar to the one shown above characterizes the solutions as ceU, where c is a matrix of constants, and U′ = V. Similarly, the solutions to y′ = Vy are eUc.

In the above, V is a square matrix, but y need not be. As long as the dimension of y allows for multiplication by V, we're ok. The constant matrix c will be the same size as y. The solution space is a vector space whose dimension is the number of entries in c, or in y.

If the nth derivative of y is a linear combination of lesser derivatives of y, then the solutions, i.e. the functions of y that satisfy this differential equation, are well characterized. This is the fundamental theorem of ordinary differential equations. Here is a sample differential equation of order 2.

y′′ - 37sin(x)y′ + (6x2-4)y = 3/(x+7)

Before we dive in, you may want to review the related theorem for discrete difference equations. That theorem can be proved in a few paragraphs, while the continuous version requires everything in this chapter, plus real and complex analysis, plus integral and differential calculus. With that in mind, the following is merely an outline. This is not a rigorous δ ε proof.

You are given a linear differential equation of order n, where all coefficients are continuous functions of x, well defined throughout an open interval, or an open disk if you are working in the complex plane. Assume this interval, or disk, contains the origin. After all, we can always move the origin by replacing x with x+s, for some s in the domain where the functions are defined, then replace x with x-s in the solution.

The lead coefficient, on the nth derivative, must be 1. If the lead coefficient is some other function of x, assume that function is nonzero about the origin, then divide through to get a lead coefficient of 1.

Finally, the initial conditions are known at the origin. You are given y(0), y′(0), y′′(0), and so on. When all these conditions are met, The equation has a solution, and that solution is unique. The procedure outlined below may not be particularly useful for solving the equation. It merely demonstrates the existence, and uniqueness, of a solution.

Although it is counterintuitive, we need to turn the problem into a system of n differential equations in n variables. That actually makes the problem easier. Let y0(x) be y(x), the function that we seek. Let y1(x) equal y′. Let y2 = y′′, and so on up to yn-1, which is the n-1st derivative. We won't need a separate variable for the nth derivative.

The initial value for the ith derivative of y is assigned to yi(0). These are the initial conditions that are provided, along with the differential equation.

For each i in [0,n-2], write the equation:

yi′ = yi+1

The n functions y0, y1, y2, etc, use to be independent, but they're not any more. Each is the derivative of the previous, and by induction, the ith function yi is the ith derivative of y(x).

For the last equation, rewrite the original differential equation, substituting yi for each derivative in turn. The leading term, the nth derivative of y, becomes yn-1′. We now have a system of n first order differential equations in n variables. Verify that the system of first order equations has a solution iff the original differential equation has a solution.

The task before us is to solve the system of simultaneous first order equations. Build an n×n matrix P that is zero everywhere, except for the following entries. The n-1 entries just above the main diagonal are set to -1. The bottom row contains the coefficients on the derivatives of the original differential equation, starting with the coefficient on y at the lower left, and ending with the coefficient on the n-1st derivative of y at the lower right. Remember, these coefficients are functions of x, hence the bottom row of P contains functions of x, and P is a matrix of functions.

Let y be the column vector [y0,y1,y2…yn-1]. Let y′ be the term by term derivative of y. Consider the expression y′ + Py, where Py is evaluated using matrix multiplication. The result is a column vector; let's take it from the top.

Take the dot product of y and the top row of P to get -y1. This is added to the first entry in y′, which is y0′. The result, y0′-y1, is equal to 0.

Move on to the next entry. Take the dot product of y and the second row of P to get -y2. Add this to y1′, and again, the sum is 0. This continues through the penultimate row of P. In other words, the first n-1 entries in y′+Py are zero.

To compute the last entry, take the dot product of y with the bottom row of P, then add the derivative of yn-1. The result is the left hand side of our original differential equation, and the answer has to be r(x), the right hand side of that equation.

Let r be the column vector that is all zeros except for r(x) at the bottom. Verify that our system of n equations has a solution iff we can find a column vector y satisfying y′ + Py = r. The system of simultaneous equations is the same, we're just writing it using concise matrix notation.

If r = 0, the homogeneous case, then the solution is given in the previous section. The solution space is an n dimensional vector space, determined by a constant n by 1 matrix c, and the solution is e-Uc, where U′ = P.

If r(x) is nonzero, subtract any two solutions to find a homogeneous solution with r(x) - r(x) = 0 on the right hand side. (This because the derivative of the difference is the difference of the derivatives.) Therefore the solution space is a shifted n dimensional vector space, i.e. the homogeneous solutions plus any solution for r(x).

Let S be the matrix e-U, where U′ = P. Since S is invertible, let y = Sz. Here z is a column vector of functions, just like y. Then apply the product rule to (Sz)′. This produces the following equations.

(Sz)′ + PSz = r

Sz′ - PSz + PSz = r

Sz′ = r

z′ = (1/S)×r

z′ = eU×r

Integrate both sides to find z. Integrate from 0, so that at t = 0, z = 0. Of course z is adjusted by a constant of integration. The constant c is a vector of length n, as each function is integrated. Thus y = S*(z+c).

At t = 0, the integral of -P is 0, and e to this power is 1, hence S, at time 0, is the identity matrix. As column vectors at time 0, y = z + c. Since z = 0, y = c. The first entry of c is y(0) (in the original equation), or y0(0) in our system of first order equations. The second entry has to be y′(0), or y1(0). The third entry is y′′(0), or y2(0), and so on. The function y is completely determined. It exists, and it is unique, and that completes the proof.

Let's close with a modest example - y′′ + y′ + y = cos(x). I picked this so that sin(x) will be one of the solutions. But we want to find all the solutions.

The 2×2 matrix P is [0,-1] [1,1]. Integrate this to get U = [0,-x] [x,x]. At this point we should convert -U to jordan form, apply the exponential, then convert back, but that looks like too much work. Ignoring the powers of x, multiply (-U) by itself, and find that (-u)3 = 1, setting up a repeating pattern with period 3. Starting with (-U)0, the upper right runs 0, 1, -1, repeat, with increasing powers of x, and n factorial in the denominator. The repeating pattern gives the ongoing derivatives of an analytic function f, which is (ewx - ewwx) / sqrt(3), where w is the cube root of 3. The lower left is -f. The lower right runs 1, -1, 0, repeat, which is (wewx - wwewwx) / sqrt(3). Finally the upper left is (wewwx - wwewx) / sqrt(3). The homogeneous solutions are at hand. The matrix S is multiplied by the column vector of constants on the right. The bottom row of S is the derivative of the top row of S, as it should be. Thus the top row of S, times c, gives y(), and the bottom row of S, times c, gives y′().

S =
 (wewwx - wwewx) / sqrt(3) (ewx - ewwx) / sqrt(3) (ewwx - ewx) / sqrt(3) (wewx - wwewwx) / sqrt(3)
*
 c0 c1

After simplifying, y is any linear combination of ewx and ewwx. Verify that y′′ + y′ + y = 0.

You're right - the coefficients on the derivatives of y are constants, and thus there is an easier way to find the homogeneous solutions. I'll address that in the next section. Still, the machinery described above is necessary if you want to find a solution for r(x), and in this example, r(x) = cos(x).

The determinant of S is -e-x. Invert S (adjoint over determinant), multiply by the column vector [0,cos(x)], integrate, then multiply by S on the left to find y and y′. It's not fun, but it's doable. I'll just assume that you get sin(x) out of the deal, or perhaps sin(x) + a homogeneous solution.

When the coefficients of a linear homogeneous equation are constants, the equation can be solved using exponentials. You've already seen the proof in chapter 6. Those were difference equations, i.e. the discrete case, but the proof is virtually identical for continuous differential equations. In fact it's easier, since you don't have the pesky exception of r = -1, and it's somewhat more intuitive. I'll run through it here, just for completeness.

Write the differential equation this way.

a0y + a1y′ + a2y′′ + … (up to an × the nth order derivative) = 0

Let p(x) be the corresponding polynomial a0+a1x+a2x2+…anxn. Let r be a root of p(x), and let f = erx. Plug f into our differential equation and get p(r)×erx. Since p(r) = 0, f is a solution.

Now assume r has multiplicity m. In other words, p(x) has m factors of x-r. Build a new solution via f = q(x)×erx. If q = 1 then f is the function described above, and we're done.

Show that f′ = (rq + q′) erx, and f′′ = (r2q + 2rq′ + q′′) erx. In general, the "next" derivative is found by taking the factor on the left of erx, call it w, and replacing it with rw + w′. Thus the third derivative is (r3q + 3r2q′ + 3rq′′ + q′′′) erx. This will remind you of Pascal's triangle.

When we search for a solution, the factor on the right, erx, is always nonzero. Divide it out; it simply goes away. All that remains is the first factor, the one that looks like a binomial expansion of q and r.

As the derivatives march along, f′ f′′ f′′′ etc, pull out the first term in each expansion. These are multiplied by a1 a2 a3 etc. This gives p(r)×q(x), which is known to be 0, since r is a root. Thus the first term in the expansion of the derivatives goes away.

Multiply the second term in each expansion by its coefficient and get something like this.

a1q′ + 2a2rq′ + 3a3r2q′ + 4a4r3q′ + … (up to an)

This is the same as p′(x), evaluated at r, and multiplied by q′(x). Invoke the theory of formal derivatives. If p has multiple roots at r, then p′(r) = 0. This entire sum drops out.

Now take the third term in the expansion of each derivative. Premultiply by the coefficients and get this.

a2q′′ + 3a3rq′′ + 6a4r2q′′ + 10a5r3q′′ + …

Confused about the "extra" coefficients? They come from the binomial theorem: 2 choose 2, 3 choose 2, 4 choose 2, 5 choose 2, and so on. In other words, k×(k-1)/2. Multiply the above expression through by 2 to get this.

2×1×a2q′′ + 3×2×a3rq′′ + 4×3×a4r2q′′ + 5×4×a5r3q′′ + …

This is equal to p′′(x), evaluated at r, times q′′(x). If the multiplicity of r is at least 3, i.e. p has at least 3 roots of r, then p′′(r) = 0, and this entire expression goes away.

Collect the fourth term from each derivative, multiply by the respective coefficients, then multiply by 6. The result is p′′′(x), evaluated at r, times q′′′(x). If the multiplicity m is at least 4, p′′′(r) = 0, and this expression goes away. This continues, until we build the mth expression, whence the mth derivative of p, evaluated at r, is no longer 0. At this point the remaining expressions must drop to 0. The best way to do that is to have the mth derivative of q disappear. In other words, the mth derivative of q = 0, as when q = xm.

As shown earlier, the solutions form a vector space. We have identified a subspace of solutions corresponding to the root r. The subspace is q(x)×erx, where q is the set of polynomials of degree < m. There is a convenient basis for this subspace:

erx, xerx, x2erx, x3erx, … xm-1erx

Apply this result to each root r in p(x), and find n different solutions. Then demonstrate linear independence using the same proof as the discrete case. The notation in chapter 6 is substantially different, so I will present the proof again here.

If all the functions in a linear combination c1 are pure exponential, the proof is easy. Evaluate l such functions on the integers from 0 to l-1, and build an l by l square matrix. This is vandermonde, hence the rows of the matrix, and the exponential functions that build these rows, are independent.

If all the functions in a linear combination c1 are based on the same exponential erx, divide through by erx, since erx is everywhere nonzero. That leaves a linear combination of powers of x, also known as a polynomial. A polynomial cannot have infinitely many roots, and cannot be everywhere 0. The supposed linear combination c1 does not exist.

At this point any linear combination of functions that yields 0 has to include more than one exponential, and at least one modified exponential. Select a linear combination c1, yielding 0, that has the smallest maximum modifier xm, and the fewest number of exponentials carrying this modifier. If c2 is any other nontrivial linear combination yielding 0, then its maximum modifier xk has k > m, or it has more exponentials tied to xm.

Let f be a function that is part of c1. Recall that f′ has the form: (rxk + kxk-1) × erx. Since xkerx in our basis implies xk-1erx, f′ is a linear combination of basis functions. Do this for all the functions in c1 and build c2, another linear combination of basis functions that yields 0.

Select x^m×1 if you can. This will drop to xm-1 in c2, and any other functions based on xm will remain. Thus c2 has fewer functions with xm, which is a contradiction. If c1has no other functions with xm, then we need to make sure c2 is nontrivial. c1 has to contain more than one exponential, and the other exponentials, besides 1, persist in c2. Thus c2 is nontrivial, and represents a contradiction.

Now select a function xmerx, where m > 0, and r ≠ 0. Both c1 and c2 contain this function. Let c3 = c1 - c2/r. Notice that the term xmerx drops out. There are now fewer terms with xm, or perhaps none, and in that case c3 presents a smaller value of m. We only need show c3 is nontrivial.

Remember that c1 is based on more than one exponential. Consider any other exponential in c1, such as eux. This is multiplied by u in c2, and 1-u/r in c3. Since u ≠ r, the term persists in c3, giving a nontrivial linear combination.

That completes the proof. The solution space has dimension n, and all solutions have been characterized. A linear differential equation with constant coefficients can be solved by finding the roots and multiplicities of the corresponding polynomial, and building exponentials and xk exponentials accordingly.