Abstract Algebra and Discrete Mathematics, Difference Equations

Difference Sequence / Difference Equation
Existence and Uniqueness
Dimensionality
Polynomial Solutions
Sum of n^th Powers
Homogeneous Linear Equation with Constant Coefficients
Phase Shifted Functions
The Golden Ratio
The Fibonacci Sequence
Fibonacci Commutes with GCD
The Fibonacci Matrix

Although calculus is not a requirement for this topic, an understanding of ordinary differential equations is recommended. As you will see, there are similarities between difference equations and differential equations. The former is a quantized version of the latter, or if you prefer, the latter is a continuous version of the former.

Let y(x) be a sequence of real or complex numbers, or perhaps elements of some other ring or field. Although x looks like a real variable, it takes on the values 0, 1, 2, 3, 4, and so on, building the sequence y(x).

As you recall from calculus, y′ is the derivative of the continuous function y(x). When x is discrete, y′(x) = y(x+1)-y(x). This builds the "difference sequence", which is analogous to the derivative, hence the borrowed notation.

There are higher order difference sequences, just as there are higher order derivatives. The sequence y′′(x) = y′(x+1)-y′(x). This can be expanded to y(x + 2) - 2y(x+1)+y(x). A recursive procedure builds the n^th order difference sequence. The n^th difference, evaluated at x, looks like the n^th row of pascal's triangle, although the coefficients alternate in sign. For example, the fourth difference sequence, evaluated at x, looks like this.

y(x+4) - 4y(x+3) + 6y(x+2) - 4y(x+1) + y(x)

Prove this relationship in general by induction on n.

Like differentiation, taking the difference sequence is a linear operator. If y and z are two sequences of x, and c is any constant, (y+z)′ = y′ + z′, and (cy)′ = cy′.

Now we are ready to define the difference equation, similar to the differential equation. Let f be a function that is evaluated at x, y(x), y′(x), y′′(x), and so on. This function must equal 0 for all x ≥ 0. The difference equation is "solved" when y(x) is found.

In most situations, the n^th order difference stands alone. It is set to a function of x, y, y′, and the other differences up to n-1. This is similar to a differential equation that sets the n^th derivative of y equal to a function of x, y, and the lower derivatives of y.

Assume the n^th difference sequence equals f(x, y, y′, y′′, …), out to the penultimate sequence at level n-1, as described in the previous section. Also assume some initial conditions are known. Specifically, let y(0) = c₀, y′(0) = c₁, y′′(0) = c₂, and so on up to c_n-1. Given these initial conditions, and f, a deterministic procedure builds y(1), y(2), y(3), and so on. This is existence and uniqueness, and the proof is much simpler than the δ ε proof in the continuous world.

Think of c as a matrix with n+1 rows and infinitely many columns. Start numbering at 0, rather than 1, for convenience. The top row is y(x), the next row is y′(x), and so on. The top row will be our solution.

The initial conditions run down the leftmost column, leaving the lower left entry, c_n,0, blank.

Since y and y′ are known at 0, and y′(0) = y(1) - y(0), solve for y(1). Specifically, y(1) = y(0) + y′(0). This fills in the top entry in the second column, also known as c_0,1.

In the same way, y′(1) = y′′(0) + y′(0). This fills in the next entry in the second column, also known as c_1,1. Continue down the column, filling in all but the last two rows.

Move on to x = 2, which is the third column. Calculate y(2), y′(2), y′′(2), and so on, leaving the bottom three entries blank. Then fill in the next column, and the next, out to y(n-1). That's as far as we can go based solely on the initial conditions.

In this illustration, the third difference is a function of x, y, y′, and y′′. Stars indicate the values that are set by the initial conditions. A question mark shows the first value that will be calculated by f.

x	0	1	2	3
c
y	*	*	*
y′	*	*
y′′	*
y′′′	?

The initial conditions can also be given as y(0) through y(n-1). This runs across the top row, rather than down the left column. The next row, y′, can be calculated out to y′(n-2). The next row can be calculated out to n-3, and so on, filling in the same triangle.

Now it is time to fill in the rest of the matrix, starting at the lower left. Fill in the lower left entry by applying f to the values above. Once this is done, the entry up and to the right can be calculated. It is simply the sum of c_0,n and c_0,n-1. Continue filling in the values up and to the right, culminating in y(n). This builds the next diagonal stripe.

At this point c_1,n can be calculated, using the values above. Fill this in, and proceed up and to the right, culminating in y(n+1). The entire matrix flows up and to the right, until the top row, y(x), is uniquely determined for all x.

this theorem is very mych like its counterpart in differential equations.

Let e be an n^th order homogeneous linear difference equation. What does this mean? It means y, y′, y′′, and so on, are multiplied by various functions of x, and their sum is 0. Here is an example of order 2.

(x³+7x+5)y′′ - e^xy′ + sin(x)y = 0

Since e is linear and homogeneous, solutions can be added and scaled. If f and g are solutions then so is every linear combination of f and g. Hence the set of solutions forms a vector space.

If f(x) is a solution, evaluate f(0), f′(0), f′′(0), and so on, through the first n-1 differences. This builds a vector v in n space. Call this map M. Thus M(f) = v, mapping a solution function f to its differences at 0.

Verify that m respects addition and scaling. The differences of f+g are the differences of f plus the differences of g. Therefore M is a linear transformation from one vector space into another. The solution space of e has been mapped into n-space.

The previous theorem asserts existence and uniqueness for every vector of initial conditions. This is the inverse of the map described above. Given v, there is a unique f such that M(f) = v. Therefore m is surjective and injective. The set of solutions is isomorphic to n dimensional space.

[ region labeled initial conditions on the left, a 3 dimensional vector space, spanned by v1 v2 v3, region on right labeled solution space, a 3 dimensional vector space, spanned by f1 f2 f3, double arrows connecting v1 f1, v2 f2, v3 f3 ]

There is a catch. The lead coefficient of e has to be invertible for every x. Divide through by this lead coefficient to write the highest difference as a function of all the others. This is necessary if we are going to turn v back into f, as described in the previous section. In our example, the lead coefficient, x³+7x+5, is nonzero for all x ≥ 0, so we're ok.

Given a set of initial conditions, how can you find the precise solution to the difference equation e? This is an exercise in linear algebra. Find a basis of solutions for e, and evaluate their j^th order differences at 0, thus building a basis of vectors in n-space. Write your vector of initial conditions as a linear combination of these basis vectors, then the same linear combination of functions satisfies e with the prescribed initial conditions.

If the right hand side of e is changed from 0 to r(x), so that e is no longer homogeneous, the solution set is a shifted vector space. Find one solution, h(x), and all solutions are of the form h+f, where f is a solution to the homogeneous equation. The dimension of the solution space is still n, but the solutions have all been shifted by h(x).

The simplest n^th order equation, call it e, sets the n^th difference to a constant c. This is a linear difference equation. As per the previous section, the solutions of e form an n dimensional space, corresponding to the vectors of initial conditions.

For any set of initial conditions, the solution is always an n degree polynomial. This is similar to differential equations, where repeated integration of c yields a polynomial. Of course the polynomials are not the same. Consider y′′ = 2, with y(0) = 3 and y′(0) = 0. If this were a differential equation, the solution would be the parabola y(x) = x²+3. When viewed as a difference equation, the solution is x²-x+3. The first three values of y are 3, 3, 5; y′ starts out 0, 2; and y′′(0) is 2. So both functions, solving the differential equation and the difference equation, are polynomials, but they are not the same.

To help us solve these equations, we are going to generalize the binomial coefficient. As you recall, (n:j), or n choose j, is n times n-1 times n-2 … down to n-j+1, divided by j factorial. Although n is usually a number, it could be a variable, such as x. Thus (x:2) = x×(x-1) / 2, and (x:3) = x×(x-1)×(x-2) / 6. By definition, (x:0) = 1.

Here is the power behind this construct. If y(x) = (x:j), then y′(x) = (x:j-1). Let's verify this algebraically; it's beautiful.

s = x × (x-1) × (x-2) × … × (x-j+2)

(x+1:j) - (x:j) = s × ((x+1) - (x-j+1)) / j! =
s × j / j! = s / (j-1)! = (x:j-1)

Once again, e sets the n^th difference to c. If the initial conditions are all zero, the solution to e is y = c×(x:n). The j^th difference sequence is c×(x:n-j), and for j < n, this equals 0 when evaluated at 0. Finally the n^th difference equals c×(x:0), which is c across the board. Thus e, setting the n^th difference to c, and having initial conditions of zeros, has been satisfied.

In the above example, think of c as c_n. Let c₀ through c_n-1 represent the initial conditions. In other words, y(0) = c₀, y′(0) = c₁, y′′(0) = c₂, and so on. As before, e sets the n^th difference to the constant c_n. The unique solution is as follows.

y(x) = c₀ + c₁x + c₂(x:2) + c₃(x:3) + … c_n(x:n)

Let's try an example. Solve y′′′ = 8, where y(0) = 1, y′(0) = 2, and y′′(0) = 3. The solution is 8(x:3) + 3(x:2) + 2(x:1) + (x:0), which yields the following cubic polynomial.

y(x) = (8x³ - 15x² + 19x + 6) / 6

Any linear combination of binomial coefficients can be rewritten as a traditional polynomial, and vice versa. The translation preserves the degree of the polynomial, and is one to one.

With this in mind, consider a difference equation that sets the n^th difference equal to a polynomial p(x). Rewrite p(x) as a linear combination of binomial coefficients. These can be "integrated" term by term. Replace each (x:j) with (x:n+j). Bring in additional terms below (x:n) for the initial conditions. Finally, turn the expression back into a polynomial.

For example, solve y′′ = x² + x + 1. Rewriting the polynomial yields 2(x:2) + 2(x:1) + (x:0). Shift this expression by 2, giving 2(x:4) + 2(x:3) + (x:2). If the initial conditions are both zero, there are no terms below (x:2). Convert back to a polynomial and get this.

y(x) = (x⁴ - 2x³ + 5x² - 4x) / 12

When Gauss was ten years old, his teacher gave his class an assignment that was sure to keep the kids busy for quite a while. The students were told to add up the numbers from 1 to 100. Gauss grouped the integers into pairs: 1+100, 2+99, 3+98, 4+97, etc, and there were 50 such pairs, giving an answer of 5050. He placed his slate on the teacher's desk, sat back down, and waited for his classmates to finish their laborious calculations. (In some versions of the story, a different arithmetic progression is assigned, but the idea is the same.)

A child prodigy, Gauss had discovered the formula for the sum of the first n integers, namely n×(n+1)/2. You are probably familiar with this formula, but what about the first n squares, or the first n cubes? Regrouping the terms doesn't seem to help at all. It looks like we're lost at sea, but difference equations are going to come to the rescue.

Return to the sum of the first x integers, and write the problem as a difference equation: y′ = x+1. The difference between the sum of the first x+1 integers, and the sum of the first x integers, is x+1. The previous section presents a procedure for solving equations of this type. Sure enough, the solution is (x:2) + x, or (x²+x)/2.

The above can be generalized to n^th powers. Let y(x) be the sum of jⁿ, as j runs from 0 to x. Now y′(x) = (x+1)ⁿ. This is a first order linear difference equation. Rewrite this using binomial coefficients, integrate, and convert back to a polynomial. For instance, when n = 2, (x+1)² = 2(x:2)+3(x:1)+(x:0). Shift the binomial coefficients, and convert back to a polynomial, giving the third entry in the following table.

y₀(x) = x { adding x ones together}

y₁(x) = (x² + x) / 2 { sum of the first x integers }

y₂(x) = (2x³ + 3x² + x) / 6 { sum of the first x squares }

y₃(x) = (x⁴ + 2x³ + x²) / 4 { sum of cubes }

y₄(x) = (6x⁵ + 15x⁴ + 10x³ - x) / 30 { sum of fourth powers }

Note that (x+1)ⁿ has, as its first binomial coefficient, n!×(x:n), which becomes n!×(x:n+1), which produces the leading term xⁿ⁺¹/(n+1). The sum of n^th powers, up to x, is a polynomial whose leading term is xⁿ⁺¹/(n+1). For large values of x, this is a good approximation.

Around 350 B.C., Aristotle and Archimedes formalized the method of exhaustion, and practically invented integral calculus, some 2,000 years before Riemann. In particular, they knew that the integral of xⁿ was xⁿ⁺¹/(n+1).

Draw a vertical line at x and consider the area under the curve from 0 to x. Divide it into j thin rectangles, each of width x/j. These rectangles, collectively, consume an area that is x/j times the sum of (kx/j)ⁿ, as k runs from 1 to j. Pull (x/j)ⁿ out of the sum, and you are left with the sum of kⁿ, which is approximately jⁿ⁺¹/(n+1). Put this all together and the area is xⁿ⁺¹/(n+1).

The Greeks didn't develop a full theory of difference equations, but they knew that the sum of the first j n^th powers was approximately jⁿ⁺¹/(n+1). Their integrals, trapping the area between lower and upper sums, were surprisingly rigorous.

If a differential equation has constant coefficients on the derivatives of y, the solution consists of exponentials, based on the roots of the characteristic polynomial having the same coefficients. A famous example from physics is y′′ = -y, or y′′ + y = 0. The force pulling you back to the origin is proportional to the distance y from the origin, like a spring bouncing back and forth. The solution is a sine wave, up and down, up and down. The characteristic polynomial is r²+1 = 0, corresponding to y′′+y = 0, and the roots are i and -i. The solution is e^ix - e^-ix, which is a sine wave.

The solution to a difference equation is also based on exponentials, and the roots of the characteristic polynomial. We just have to tweak things a bit.

Let r be a real or complex constant, and let s = r+1. Let y(x) = s^x. Evaluate y′, s^x+1 - s^x, and get rs^x, or ry. By induction, the n^th difference becomes rⁿy.

In what follows, w will be shorthand for s^x. Let y(x) = xw. Evaluate y′ and get (rx+s)w. Let's try to generalize this.

Let y(x) = (x:k)w, for any k > 0. Evaluate at x+1 and the factors of (x:k) shift upwards by 1. This is multiplied by s^x+1, or sw. Then subtract (x:k)w. Pull w out, and look at what remains.

(x+1) × (x:k-1)s/k - (x:k)

Add and subtract (x:k-1)s.

(x+1) × (x:k-1)s/k - (x:k-1)s + (x:k-1)s - (x:k)

(x+1) × (x:k-1)s/k - k(x:k-1)s/k + (x:k-1)s - (x:k)

(x-k+1) × (x:k-1)s/k + (x:k-1)s - (x:k)

(x:k)s + (x:k-1)s - (x:k)

(x:k)s - (x:k) + (x:k-1)s

(x:k)r + (x:k-1)s

Bring w back in and get this.

y′ = ((x:k)r + (x:k-1)s) × w

If w is multiplied by a linear combination of binomial coefficients, apply the above formula term by term. Split each term into two, one with a factor of r and one with a lesser degree and a factor of s. Therefore the second difference is as follows.

y′′ = ((x:k)r² + 2(x:k-1)rs + (x:k-2)s²) × w

y′′′ = ((x:k)r³ + 3(x:k-1)r²s + 3(x:k-2)rs² + (x:k-3)s³) × w

This will remind you of Pascal's triangle. Factors of r build up at one end and factors of s build up at the other. I'm going to call the entire sum (without w) a "pascal expression", since it looks like a line in his triangle.

Some of these pascal expressions will be truncated. If k = 2, then (x:k-2) = 1, and there are no terms beyond this point.

y′′ = ((x:2)r² + 2(x:1)rs + (x:0)s²) × w

y′′′ = ((x:2)r³ + 3(x:1)r²s + 3(x:0)rs²) × w

These exponential functions, and polynomial exponential functions, will solve the difference equation, just as their counterparts did in calculus.

Let a_i be the coefficient on the i^th term in the difference equation. Thus the equation is a₀y + a₁y′ + a₂y′′ + … up to a_n times the n^th difference, equals 0. Here is the corresponding characteristic polynomial p(u).

a_nuⁿ + a_n-1u^n-1 + … + a₂u² + a₁u + a₀

This has n roots in the complex plane, although some of the roots may be repeated. Let r be such a root, and let s = r+1. Let w = s^x, and start by setting y = w. Evaluate the i^th difference and get rⁱy. This is multiplied by a_i. Factor out y, and find p(r)×y, which drops to 0. The difference equation is satisfied, and s^x is a solution.

For example, 0 could be a root, if there is no y term. This tells us the constant function 1^x is a solution, and indeed, all the difference sequences drop to 0, and 1 is a solution.

If r = -1 we have a problem. The solution is 0^x, which is trivial. Try y′ = -y as an example, with an initial condition of y(0) = 1. Set y(1) = 0, so that y′(0) = -1, as the equation demands. Thereafter, set y(x) = 0. This gives a nontrivial solution that is not exponential. The higher order differences at 0 alternate between 1 and -1, while the differences for x > 0 are all 0. I will call this a finite solution, because y(x) drops to 0 and stays there. Finitely many values are nonzero.

x	0	1	2	3
c
y	1	0	0	0
y′	-1	0	0	0
y′′	1	0	0	0
y′′′	-1	0	0	0

Assume r (other than -1) is a double root, and let y = xw. This is also known as (x:1)w. Each difference, starting with the n^th order difference, is a truncated pascal expression of length 2. Group the first term in each of these pascal expressions together to get p(r)×x. This drops to 0.

The second term in each expression is divisible by s; factor that out. That leaves na_nr^n-1 + (n-1)a_n-1r^n-2 + … + 2a₂r + a₁. This is the derivative of p, evaluated at r. Review the theory of formal derivatives. When r is a multiple root, p′(r) = 0. The second terms from the pascal expressions sum to zero. When r is a root with multiplicity 2 or higher, xw is a solution.

Next assume r has multiplicity 3 or higher. Let y = (x:2)w. The lead terms of the pascal expressions sum to p(r)×(x:2), which is 0. Group the second terms together and get p′(r)xs, which is also 0. This time the third terms are present. Gather them together, factor out s², multiply by 2, and get this.

n(n-1)a_nr^n-2 + (n-1)(n-2)a_n-1r^n-3 + … + 12a₄r² + 6a₃r + 2a₂

This is p′′(r), which is 0 when r has multiplicity ≥ 3. The expressions truncate at this point, hence (x:2)w is a solution.

If r has multiplicity at least 4, let y = (x:3)w. The first, second, and third terms of the pascal expressions regroup, as they did before, giving p(r), p′(r), and p′′(r), which drop to 0. Group the fourth terms together, factor out s³, multiply by 6, and find a sum equal to p′′′(r), which is 0. The pattern continues up to (x:m-1)w, where r has multiplicity m.

If these exponential and polynomial exponential solutions are linearly independent - one function for each root of the characteristic polynomial p(x) - then by dimensionality the entire solution space of dimension n has been spanned. The difference equation has been solved.

One is tempted to separate the equations by their behavior as x approaches infinity, sometimes called big O notation, but I will prove independence using only algebra. Thus the same proof works for any field containing Z, not just the real or complex numbers. This proof also works, with minor tweaks, for differential equations.

If all the functions in a linear combination c₁ are pure exponential, the proof is easy. Evaluate l such functions on the integers from 0 to l-1, and build an l by l square matrix. This is vandermonde, hence the rows of the matrix, and the exponential functions that build these rows, are independent.

If all the functions in a linear combination c₁ are based on the same exponential s^x, divide through by s^x, since s^x is everywhere nonzero. That leaves a linear combination of functions (x:k) for various values of k. This can be rewritten as a traditional polynomial, whose degree is the highest value of k in (x:k) in c₁. A polynomial cannot have infinitely many roots, and cannot be everywhere 0. The supposed linear combination c₁ does not exist.

At this point any linear combination of functions that yields 0 has to include more than one exponential, and at least one modified exponential. Select a linear combination c₁, yielding 0, that has the smallest maximum modifier (x:m), and the fewest number of exponentials carrying this modifier. If c₂ is any other nontrivial linear combination yielding 0, then its maximum modifier (x:k) has k > m, or it has more exponentials tied to (x:m).

Let f be a function that is part of c₁. Recall that f′ has the form: ((x:k)r + (x:k-1)s) × s^x. (This even works if s = 1 and r = 0). Since (x:k)s^x in our basis implies (x:k-1)s^x, f′ is a linear combination of basis functions. Do this for all the functions in c₁ and build c₂, another linear combination of basis functions that yields 0.

There is at least one exponential associated with (x:m), call it (x:m)s^x. If possible, select (x:m)1^x. This will drop to (x:m-1) in c₂, and any other functions based on (x:m) will remain. Thus c₂ has fewer functions with (x:m), which is a contradiction. If c₁has no other functions with (x:m), then we need to make sure c₂ is nontrivial. c₁ has to contain more than one exponential, and the other exponentials, besides 1^x, persist in c₂. Thus c₂ is nontrivial, and represents a contradiction.

At this point c₁ has one or more functions with (x:m), and none of these are pure polynomials. Select one of these functions, (x:m)s^x, remembering that m > 0, and s ≠ 1. Both c₁ and c₂ contain this function. Let c₃ = c₁ - c₂/r. Notice that the term (x:m)s^x drops out. There are now fewer terms with (x:m), or perhaps none, and in that case c₃ presents a smaller value of m. We only need show c₃ is nontrivial.

Remember that c₁ is based on more than one exponential. Consider any other exponential in c₁, such as u^x. This is multiplied by u-1 in c₂, and 1-(u-1)/r in c₃. Since u-1 ≠ r, the term persists in c₃, giving a nontrivial linear combination.

That completes the proof. The functions are independent, and the solution space has been characterized.

Oops - what about the pesky root -1, with multiplicity m? Let y(x) = 0, except for y(m-1) = 1. Taking the differences at x = m-1 alternates between 1 and -1, just as it did before. The pattern has shifted from column 0 to column m-1. Everything to the right of this column is 0.

When x is m or larger, the entire column is 0, and the difference equation e is satisfied. When x = m-1, replace y with 1, y′ with -1, y′′ with 1, y′′′ with -1, and so on. This effectively replaces u, in our characteristic polynomial p(u), with -1, which is a root. Thus e is satisfied for all x ≥ m-1.

Notice that the alternating sequence of 1 and -1 agrees with the coefficients of 1/(1+v). Look at the differences to the left, at column m-2. Since y(m-2) is 0 by definition, y′(m-2) has to be 1. To get y′′(m-2), subtract y′(m-1)-y′(m-2), giving -2. Follow this down the column, and we are essentially dividing the series 1/(1+v) by 1+v, and shifting it down one row. Therefore the differences at m-2 are the coefficients on v/(1+v)². By induction, the differences at m-j are the coefficients on v^j-1/(1+v)^j.

0	0	0	1
0	0	1	-1
0	1	-2	1
1	-3	3	-1
-4	6	-4	1
10	-10	5	-1

We already showed y solves e for x = m-1 and beyond. Let's look at x = m-j. Since p has m factors of u+1, it certainly has j factors, and if the differences of y cause q(u) = (u+1)^j to drop to 0 then p(u) is satisfied, as is e.

Expand (u+1)^j bythe binomial theorem, to find q(u). Plug in y and the differences of y, as shown in column m-j. Most of these are 0. In fact they are all 0 except for the last two, which are 1 and -j. Replace u^j with -j, and replace u^j-1 with 1. This gives 1×-j + j×1 = 0. q(u) = 0, p(u) = 0, and e is satisfied at x = m-j. This holds for all j, hence y is a solution for e.

When the root -1 has multiplicity m, there are m linearly independent solutions. The j^th solution sets y(m-j) = 1, while y is 0 elsewhere. Combine these, and any finite vector of length m, followed by zeros, is a solution.

These finite solutions join the exponential solutions to build a basis for the entire solution space of our homogeneous linear difference equation with constant coefficients.

Let e be a linear combination of the 8 functions y(x) through y(x+7). Set e = 0 to create a difference equation that involves phase shifted functions rather than difference sequences.

As you recall, the 7^th order difference sequence of y, evaluated at x, is a linear combination of y(x) through y(x+7). In fact it looks like a line from pascal's triangle, with alternating signs. Use this to substitute for y(x+7) in e. Now e is a linear combination of y(x) through y(x+6), and the 7^th difference of y. Next, write the 6^th difference as a linear combination of y(x) through y(x+6), and substitute for y(x+6) in e. Repeat this process, replacing y(x+5) through y(x+1). The result is a traditional difference equation. The order of the equation, and the dimension of the solution space, is equal to the maximum phase shift.

Consider the equation y(x) = y(x+4). The solution space has 4 dimensions, according to the values of y(0) through y(3). After that the sequence repeats. So it's easy to solve at an intuitive level, but let's solve it using the theory of exponentials.

Pretend like the phase shifted equation has been turned into a traditional difference equation of degree 4. We started with constant coefficients, so after the conversion, the coefficients will still be constants. Finite solutions are not possible in this case, so no worries about a root of -1. Repeated roots are rare, so look for a pure exponential solution. Set y = s^x, and write s^x = s^x+4, or 1 = s⁴. Let s = ±1 or ±i and find four independent solutions. These begin: [1,1,1,1], [1,-1,1,-1], [1,i,-1,-i], and [1,-i,-1,i], and repeat thereafter. Stack these up and build a 4 by 4 matrix that is vandermonde. Thus these four functions are independent, and form a basis for the solution space.

If a sequence is periodic, it is defined by an equation y(x+n) = y(x), where n is the phase shift. In the previous paragraph, n was 4. In general, the solutions are based on the n^th roots of 1 in the complex plane, also known as cyclotomic extensions. These are the n independent functions that span the function space.

Euclid (and probably his predecessors) imagined a line segment cut into two pieces of lengths x and y, where x > y, and the ratio of x+y to x equals the ratio of x to y. The total over the longer equals the longer over the shorter.

Set x = 1, hence 1+y = 1/y. Solve this using the quadratic formula, and y = ½(sqrt(5)-1). The total length, 1+y, is ½(sqrt(5)+1), which is approximately 1.618034. This constant is now called the golden ratio, and is sometimes assigned the Greek letter φ.

The Greeks, ever fascinated with ruler and compass construction, found that φ was a constructible number. (We now know that anything involving square roots is constructible; we'll see this in a later chapter.) They used φ to build the regular pentagon. If a pentagon has sides of length one, its diagonals have length φ. Let's see why this is so.

If a pentagon looks like a house, draw the two diagonals from the apex down to the corners of the floor, building an isosceles triangle that points upward. The triangles on either side are also isosceles, and congruent. Their base angles are half of 180-108, or 36 degrees. Thus the base angles of the central triangle are 108-36, or 72 degrees. We need to prove that the cosine of 72 degrees, or the sine of 18 degrees, is ½ over φ, or 1 over sqrt(5)+1.

[ pentagon with two diagonals from apex down to the corners of the floor, floor has length 1, diagonal has length φ ]

Let s and c be the sine and cosine of 18 degrees respectively. The cosine of 5×18 degrees is 0. Use the double angle formula twice for the sine and cosine of 4×18 degrees, then the angle addition formula for the sine and cosine of 5×18 degrees, to get this equation.

0 = c⁵ - 10c³s² + 5cs⁴

Divide through by c and replace c² with 1-s².

0 = 1-12s²+16s⁴

12s² = 1 + 16s⁴

12/s² = 1/s⁴ + 16

Remember that 1/s is suppose to be sqrt(5)+1. Substitute in the above and verify the equation. Both sides become 24×sqrt(5) + 72. The length of the diagonal is indeed φ.

In 1202, Fibonacci discovered the sequence that now bears his name. It is defined recursively: fib(1) = fib(2) = 1, and fib(n+1) = fib(n) + fib(n-1). The first few terms are as follows. I include fib(0) = 0 to be consistent with our notation.

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, …

[ picture of sunflower or clam shell or some example of fib in nature ]

This sequence satisfies the difference equation y(x+2) = y(x) + y(x+1). Equations of this type have exponential solutions. Replace y(x) with s^x and solve the equation s² = s+1. Apply the quadratic formula, and s = ½(1±sqrt(5)). These are φ and 1-φ respectively. These two solutions, [1,φ,…] and [1,1-φ,…], are linearly independent, and span the entire solution space.

The initial conditions set y(0) = 0 and y(1) = 1. This yields the following solution.

fib(x) = (φ^x - (1-φ)^x) / sqrt(5)

Since φ is larger than 1-φ in absolute value, the first term dominates. The fibonacci sequence approaches φ^x/sqrt(5). The ratio between successive values approaches φ, the golden ratio.

See the Jumping Peg Problem for another wonderful application of the golden ratio.

Here's an interesting theorem that you don't see every day: gcd(Fib(m),Fib(n)) = Fib(gcd(m,n)). It's certainly true when m or n is 0 or 1, so assume m and n are at least 2.

To save time and space I'm going to right f(n) for fib(n). Also, n starts at 1, not 0.

It is easy to see that the sequence runs: odd odd even odd odd even odd odd even, and so on, so that every third element is even.

Look mod 3 and the series is 1,1,2,0,2,2,1,0, repeat.

Let's generalize this to mod p. We begin 1,1,2 and so on, until that magic moment when we get a 0, if we ever do. Say the prior value is v. The next two values are v and v. After that comes 2v, then 3v, then 5v, and so on. The pattern is the same as before, except multiplied by v. Since p is prime, this isn't going to produce 0 until we reach 0*v. The previous pattern repeats, just multiplied by v. The next pattern is multiplied by v², and so on, coming back around to 1 when some power of v is 1 mod p. This is the order of v mod p.

Assuming we reach 0 mod p, there is some d such that f(d) f(2d) f(3d) and so on are divisible by p, and nothing else is. Here d is the length of the pattern mod p, from start back around to 0.

Next look at the sequence mod p^h. The sequence starts out 1,1,2,3,5 as usual, and if it becomes 0 the index is xd. It has to be a multiple of d, since those are the fibonacci numbers divisible by p. Let v be f(xd-1) mod p^h. The next two values are v and v. Note that v is not divisible by p. It is therefore relatively prime to p^h. The cycle that starts with v v is the same as the cycle that starts with 1 1 - but multiplied by v, and you don't get 0 until f(2xd). Only the multiples of xd have a fibonacci number divisible by p^h.

This assumes every p^h leads to 0 eventually. Look again at the sequence mod p^h, and suppose it never becomes 0. The possibilities are finite, so the sequence repeats a pair of numbers at some point. Perhaps it attains the values 2,7, and then later on 2,7. Reverse the process. In both cases the value before 2,7 was 5. Back up all the way to start and find two copies of 1,1. This means the cycle repeats starting at 1,1. And what comes just before the second instance of 1,1? Zero. Thus there is some fibonacci number divisible by p^h.

If p^h divides f(c) and qⁱ divides f(d), then p^hqⁱ divides f(cd). For any integer k, there is some fibonacci number divisible by k.

Those are the lemmas, now on with the proof. For any index d, f(d) is the product of various primes raised to various powers. These all divide f(2d) f(3d) f(4d) etc. If d is the gcd ( m and n, f(d) divides f(m) and f(n), and their gcd.

That's one direction. Consider the other direction. Assume p^h divides f(m) and f(n). Fibonacci numbers first become divisible by p^h at some index l, and l has to divide m and n. Thus l divides d, where d = gcd(m,n), and p^h divides f(d). Do this across all primes, and gcd(f(m),f(n)) divides f(d).

Each divides the other, giving f(gcd(m,n)) = gcd(f(m),f(n)).

This is not the only sequence that commutes with gcd. Apply the same proof to the following sequence.

f(1) = 1
f(2) = 2
f(n) = f(n-2) + 2f(n-1)

Continue the earlier notation, wherein f(n) is the n^th fibonacci number, with f(0) = 0. Recall that f(n) + f(n+1) = f(n+2). Use this to derive the following.

f(n) + 2×f(n+1) = f(n)+f(n+1) + f(n+1) = f(n+2) + f(n+1) = f(n+3)

Let M, and the powers of M, be the following matrices, which happen to have determinant 1. Use the formula above, and induction on k, to prove the result for M^k.

M		M²		M³		M⁴		M^k
2	1	5	3	13	8	34	21	f(2k+1)	f(2k)
1	1	3	2	8	5	21	8	f(2k)	f(2k-1)

Since the eigen value of M^k is the eigen value of M, raised to the k, and since each k takes two steps in the fibonacci sequence, we would expect the eigen value of M to be related to the square of the golden ratio in some way, and it is. Subtract x from the main diagonal and take the determinant, giving a characteristic polynomial of x²-3x+1. This gives an eigen value of ½(3±sqrt(5)). This is the square of ½(1±sqrt(5)), which is the base for the fibonacci sequence.