Abstract Algebra and Discrete Mathematics, Banach and Hilbert Spaces

Normed Vector Space, Banach Space
Continuous Module
Equivalent Norms
Quotient Space
Bounded Linear Operator
Bounded Operators form a Vector Space
The Hahn Banach Theorem
Continuous Map Becomes Bicontinuous
Closed Graph Theorem
Topological Vector Space
Finite Dimensional is Euclidean
Locally Compact Topological Vector Space
Hilbert Space
Dot Product is Continuous
Orthonormal System, Hyperbasis
Every Linear Continuous Function is Actually a Dot Product
Inseparable Hilbert Space

There are two ways to look at euclidean space. It is a vector space with lines and planes and scaling factors, and rigid rotations, and other transformations that respect the linear structure of the space. Or it is a space with distance, and open and closed sets, and continuous functions that respect the underlying topology.

Over the past 200 years much has been written about vector spaces, and metric spaces - and they almost seem like separate branches of mathematics. But what if a space is both a vector space and a metric space? This is a banach space, and it is closer to Rⁿ than a vector space or a metric space alone. In fact a finite dimensional banach space is equivalent to Rⁿ. (We'll prove this below.) However, there are infinite dimensional banach spaces that do not resemble euclidean space.

A normed vector space, also called a normed linear space, is a real vector space S with a norm function denoted |x|. The norm has the following properties.

The norm maps S into the nonnegative reals.
The norm of x is 0 iff x = 0.
For any real number c, |cx| = |c|×|x|.
|x+y| ≤ |x|+|y|.

Sometimes the norm is derived from a dot product, but it need not be.

This norm can be turned into a metric, thus turning our normed space into a metric space. Let the distance d(x,y) be the norm of x-y. Since x-y is -1 times y-x, property 3 above tells us this is well defined.

By property 2, d(x,y) = 0 iff x = y.

Let a triangle have vertices z, z+x, and z+x+y. Subtract z and apply property 4. This establishes the triangular inequality. Thus d becomes a distance metric, and S is a metric space, with the open ball topology.

A banach space is a normed vector space that forms a complete metric space. Every cauchy sequence in S converges to a limit point in S.

At this point the word subspace has become ambiguous. If W is a subspace of S, is it an arbitrary subset of S that inherits the open ball topology, or is it a vector space contained in S? Sometimes you have to infer the correct definition from context. The term linear subspace refers to a sub vector space. This is also called a linear manifold, and that's unfortunate, since a manifold is also a space that is locally homeomorphic to Rⁿ. I'll stick with linear subspace.

A finite dimensional (finitely generated) linear subspace is closed in S. For example, a line is a closed set in the plane. The proof is a technical exercise in real analysis.

Choose a basis b₁ through b_n for this subspace, and build a box around the origin of dimensions d₁ through d_n. The box consists of linear combinations of b₁ through b_n using coefficients bounded by d₁ through d_n in absolute value. The origin is at the center of this box, i.e. when all coefficients are 0. Draw a segment from the origin to the edge of the box, in any direction. The distance starts out 0 at the origin and advances linearly, as the coefficients grow, until one of those coefficients reaches its limit. The distance to the edge of the box is a function of direction. The directions form a sphere of dimension n-1, and that is compact. A tiny change in direction changes the coefficients at the edge of the box only slightly. This adds something small to the previous location, hence it changes the distance only slightly. In other words, the distance to the edge of the box is a continuous function of direction.

Continuous on a compact set means there is a minimum distance that is attained at a particular direction. Only the origin has a distance of 0, hence this distance is nonzero. Each box admits such a distance, and a larger box admits a larger minimum distance. Start at the point on the outer box that exhibits the minimum distance, and retrace the ray back to the origin, and find a point on the inner box with a smaller distance. Every point on the outer box is farther from the origin then the minimum distance of the inner box.

Translate these boxes to any point u in n space. The distance from u to u+z is the same as the distance from z to the origin. The boxes centered at u present the same minimum distances from u.

Now let p be a point in the closure, so that every open ball containing p intersects our n dimensional space. Shrink these balls to zero, and let q_i be a sequence of points in the subspace that monotonically approaches p. Consider the coefficients on b that build the points in the sequence q. Suppose the coefficients on b₁ through b₃ are cauchy, and approach the real values c₁ through c₃, while the coefficients on b₄ through b₆ are not cauchy. These sequences present differences of 2d₄ through 2d₆ infinitely often. That defines a 3 dimensional box with some minimum distance ε. This implies a minimum distance of 2ε from one side of the box through the center and to the other side of the box. Move out in the sequence q so that all points are strictly within ε/2 of p. They are then less than ε of each other. Further move out so that the coefficients on b₁ through b₃ are close enough to c₁ through c₃ so that the difference between a_1,ib₁ + a_2,ib₂ + a_3,ib₃ and c₁b₁ + c₂b₂ + c₃b₃ is less then ε/2. Focus on b₄, somewhat arbitrarily, and find two coefficients on b₄ that differ by at least 2d₄. This can be done since the sequence is not cauchy. This establishes two points of q, which lie on opposite sides of the box, presenting a distance of at least 2ε. Add in the distance introduced by the first three coordinates, and the first point moves by less than ε/2, and the second point moves by less than ε/2. The points are still at least ε apart. Yet their distance is less than ε. This is a contradiction, hence the sequence of coefficients, on each of the basis vectors, is cauchy.

Let c_j be the limit of the cauchy sequence of coefficients on b_j. Let r be the sum over c_jb_j. The distance from r to p has to be 0, else the sequence q is bounded close to r and away from p. Therefore r = p, and p is part of our subspace. The subspace is closed.

If S is not complete then complete it, and find r as above. r = p, and p was part of our original space S, so p belongs to the n dimensional subspace, and the subspace is closed.

Let u and v be points in S, and put a ball of radius ε around u+v. Points within ½ε of u, plus points within ½ε of v, wind up within ε of u+v. The preimage of the open ball about u+v is covered by open sets in S cross S. Plus is a continuous operator, and it turns S into a continuous group.

How about scaling by c? Keep points within ε/c of u, and the image is within ε of cu. (Treat c = 0 as a special case.) Thus scaling by c is a continuous function from S onto S, (or from S onto 0 if c = 0), and S is a continuous R module.

If c is nonzero, then keep the scaling factor close to c, and choose δ small, so that the open ball of radius δ about p maps entirely inside an open ball containing q, where cp = q. Scaling is a continuous function from R cross S onto S, except where c = 0.

The translate, or translation, or shift, of a set W in S, by a vector x, is the set x+y for all y in W. We're just sliding the set W along in S.

Translation is a continuous function from S into S. The distance from a to b is the same as the distance from a+x to b+x. Open balls correspond to open balls.

Since translation is a bijection, a set W is homeomorphic to any of its translates, using the subspace topology. A plane has the same open and closed sets as a shifted copy of that plane in space.

Two norms f(S) and g(S) are equivalent if there are positive constants b and c such that bf ≤ g ≤ cf.

Divide by b, or c, and obtain g/c ≤ f ≤ g/b. The relation is symmetric. Set b = c = 1 to show the relation is reflexive. If bf ≤ g and dg ≤ h, then bdf ≤ h. The relation is transitive, giving an equivalence relation on norms. Norms clump together in equivalence classes, as they should, since they are called "equivalent norms".

Let's cover an open set in f with open balls in g. A point p in our open set is a certain distance d from the nearest edge, as measured by f. The points within bd of p, measured by g, are all within d of p, measured by f. So p is contained in an open g ball inside our open set. Open sets in f remain open in g, and by symmetry, open sets in g are open in f. The topologies are the same.

The identity map on S, from f to g, is uniformly bicontinuous.

As we move from f to g, cauchy sequences remain cauchy, and the limit point of our sequence becomes the limit point of the same sequence under g. If S is complete under f, it is complete under g, i.e. still a banach space.

A pseudo norm, without property 2, produces a pseudo metric. Collapse points that are 0 distance apart to build a new metric space. But this time S is a vector space, so there's more to the story.

If x and y are 0 distance apart, then u+x and u+y are 0 distance apart. Addition on equivalence classes is well defined. Also, addition remains commutative, and associative, and continuous, so we still have a topological group. Make similar observations for scaling, and the quotient space is a continuous module. Merge the inseparable points of a pseudo normed vector space and get a normed vector space.

Picture the z axis in 3 space. A linear transformation squashes the z axis to 0, and the result is the xy plane. We can generalize this to a normed vector space.

Let S be a normed vector space and let U be a linear subspace. Build a pseudo norm on S as the distance from x to U. Technically, this is the greatest lower bound of the distances from x to all the points of U. We need to show this is a pseudo norm.

If q is the point in U with minimum distance to p, scale p and q by c and the distance is multiplied by c. Yet the distance to every other point in U is also multiplied by c. (Everything in U is c times something else in U.) Thus q remains the closest point, and distance is scaled by c.

If there is no minimum q, let q_i be a sequence of points whose distance from p approaches the lower bound. All distances are multiplied by c, and the lower bound is multiplied by c, as the sequence q_i*c illustrates.

The triangular inequality is inherited from S. Let x and y be any two points in S, and let p and q be points in U that hold the distances |x,p| and |y,q| to within ½ε of their true distances from U. The distance from p+q to x+y is now bounded by the sum of the true distances to U, + ε.

|(x+y) - (p+q)| = |(x-p) + (y-q) | ≤
d(x)+½ε + d(y)+½ε ≤ d(x) + d(y) + ε

Let ε approach 0, and the sequence of points p+q proves the norm of x+y is no larger than the norm of x plus the norm of y. We have satisfied the properties of a pseudo norm.

If p is in the closure of U then a sequence q approaches p, and p is 0 distance from U. Extend U to the closure of U; hence U is a closed set in S. Verify that this does not change the distance from x to U. If |x,p| attains the minimum distance then a sequence of points approaching p gives that same distance as a lower bound.

With U closed, a point not in U is a positive distance from U. Otherwise a sequence of points in U approaches x, and x would be included in U. Thus U, and only U, has norm 0.

If y is in U, add y to x. This does not change the set of distances from x to the points of U. The distance to p has become the distance to p+y, and so on. Thus the shifted subspace x+U is a fixed distance from U, and has a well defined norm.

Collapse the cosets of U down to single points, giving a quotient space S/U. Like U, each coset of U is 0 distance from itself, and a positive distance from everything else. Thus we are also collapsing the inseparable points, and turning the pseudo metric into a true metric. The result is both a topological quotient space and a linear quotient space. No ambiguity here; S/U is a quotient space.

If S is complete, is S/U complete? For starters, U is closed by assumption, and a closed subspace of a complete metric space remains complete. If a cauchy sequence in U converges to p, then p is in the closure of U, and is in U.

Let q_n be a sequence in S that becomes cauchy in S/U. Find a point b_n in U, so that |q_n-b_n| in S is bounded by |q_n| in S/U + 1/n. After a time, the points of q in S/U never differ by more than ε, and moving farther out, 1/n is less than ε, hence the norms of q-b in S never differ by more than 3ε. That makes q-b cauchy in S, with a limit point r. The difference sequence q-b comes arbitrarily close to r-0. Pass to the quotient space, where b doesn't matter, and q comes arbitrarily close to r. Each cauchy sequence has a limit, and S/U is complete. If S is banach then S/U is banach.

A linear operator is a map between vector spaces that respects addition and scaling. Put another way, a linear operator is a module homomorphism.

Assume the domain and range are normed vector spaces. The operator f is bounded if there is some constant k such that |f(x)| ≤ k×|x|. The function does not grow faster than linear.

Note that f(0) has to be 0, but this is the case for any linear operator.

Move to a point v and find the same bound relative to v.

|f((x+v) - v)| ≤ k×|x|.

If f is bounded it has a norm, denoted |f|, which is the lower bound of all the constants k that make f a bounded operator. This is also called the Lipschitz constant. Can we home in on |f|?

If x satisfies our constraint for a fixed k, then so does cx. To see if k is valid, there is no need to test the multiples of x. It is enough to test x/|x|, the unit vector in the direction of x. Consider all the points x on the unit sphere and evaluate |f(x)|. (I'm calling it a sphere, but I really mean all the points that are a distance 1 from the origin. This could be the surface of a cube, or almost any other shape, depending on the norm.) Let k be the least upper bound, and f is a k bounded linear operator. Lower values of k will not do, thus |f| = k.

In Rⁿ, when f is implemented as a matrix, you might think |f| is the largest eigen value, but this need not be the case. Let f be the 2×2 upper triangular matrix [1,1|0,1]. The eigen values are 1, but run 1,0 through the matrix and get 1,1 with length sqrt(2).

If f is a normal matrix, e.g. a symmetric matrix, then its eigen vectors are orthogonal, and |f| is indeed the largest eigen value. Of course we have to take the norms of the eigen values, so that -4 is bigger than 3, forcing k = 4.

If f is the quotient map S mod a closed subspace U, which is a linear map, as described in the previous section, the bound on f is 1. By definition the distance from x to U has to be |x-0| or less. If x remains nonzero in S/U then let q_i be a sequence x-b_i whose distances approach the distance d from x to U. |q_i| is arbitrarily close to d, with |f(q_i)| = d, hence |f| = 1.

Continuity of f is demonstrated at 0. Assume a ball of radius ε in the range includes the image of a ball of radius δ in the domain. Move these balls to v and f(v). With |b| < δ, f(v+b) = f(v) + f(b) which is within ε of f(v).

Though it is not linear, norm is continuous from S into the reals. Place an interval around |v| of radius ε. Keep |b| below ε, and by the triangular inequality |v+b| lies inside our open interval.

Assume f is continuous, and the unit sphere is compact. Norm is continuous, as described above, so |f(x)| is a continuous function on the unit sphere. This is a continuous function from a compact set into the reals. The image is compact, hence closed and bounded. The linear operator f is bounded.

Apply the above when the domain is Rⁿ. The unit sphere is closed and bounded in Rⁿ, hence compact. We only need show continuity. Focus on one of the n coordinates. Our linear operator is continuous on R; in fact it scales R by a fixed amount and embeds it in the range. A linear operator on Rⁿ is the sum of n linear operators on R, and is continuous. Thus we have a continuous function on a compact set, and every linear operator on Rⁿ is bounded.

In fact f is continuous iff it is bounded. If f is bounded, f is continuous at 0, hence continuous. In fact f is uniformly continuous. Distance is magnified by at most k, everywhere. Conversely assume f is continuous. Select an r so that |x| < r implies |f(x)| < 1. The norm of the image of the sphere of radius r is at most 1, hence 1/r acts as a bound for f.

It's easy to build a linear function that is not bounded, and not continuous. Let b₁ b₂ b₃ etc form a basis for an infinite dimensional vector space, with the generalized topology. Let f map b₁ to b₁, b₂ to 2b₂, b₃ to 3b₃, b₄ to 4b₄, and so on. The image of the j^th unit vector has length j, and f is unbounded.

If A and B are vector spaces then the linear operators from A into B form another vector space. This because linear functions can be added and scaled.

Linear functions from A into B are sometimes denoted hom(A,B). The word "hom" is short for homomorphism, because these functions are actually module homomorphisms from A into B. This concept is generalized here.

Let A and be be normed spaces, and note that the bounded operators from A into B form a vector space. Scale a transformation f and you scale its norm |f|. The norm of f+g is the maximum of the image of the unit sphere under f+g, which is no larger than the maximum of |f(x)|+|g(x)|, which is no larger than |f| + |g|. The set of bounded homomorphisms from A into B is denoted boundhom(A,B). This structure is another normed vector space, via |f|. We just proved the triangular inequality and scaling. If |f| = 0 then f is identically 0, so we are done.

If B is complete then so is boundhom(A,B). Let f₁ f₂ f₃ etc be a cauchy sequence of bounded linear operators. Define a new function g as follows. Let g(0) = 0. For x on the unit sphere, consider the sequence f_n(x) in B. The difference between two functions, on the unit sphere, is bounded by the norm of their difference, which is the "distance" between the two functions. In a cauchy sequence this distance shrinks to 0. For any ε, we can move down the sequence, and keep |f_i-f_j| below ε. This keeps f_i(x)-f_j(x) below ε. The sequence of images of x is cauchy, and converges to some limit in B, which becomes g(x).

Is g a linear function? Consider x and y in S. Since each f is linear, the sequence f_n(x+y) is the sum of the individual sequences, and the limit of f_n(x+y), also known as g(x+y), is the sum of the limits, or g(x)+g(y). In short, the limit of the sum is the sum of the limits in a metric space. Similar reasoning shows g respects scaling by c, hence g is linear.

Let's show g is the limit of our sequence f_n. Remember, |g-f_n| is the distance metric. For each x on the unit sphere, g(x) is the limit of f_n(x). Find an n so that functions beyond n are within ε of each other, all over the unit sphere. g(x) could be a distance ε from f_n(x), as the functions approach their limit, but no more. This holds for all x on the sphere, and keeps the norm of g-f_j ≤ ε for j beyond n. This holds for each ε, hence f converges to g.

Set ε to 1, and g is within 1 of some bounded function f_n. This makes g a bounded function. Every cauchy sequence converges, and boundhom(A,B) is complete.

Equivalent norms on B lead to equivalent norms on the space of bounded functions from A into B.

A functional f is a linear map (respecting scaling and addition) from a real vector space S into the reals. If you think of S as an R module, then a functional on S belongs to the dual of S. The following theorem extends a functional from a subspace T up to a larger space S.

Let S be a normed vector space and let T be a linear subspace of S. If f is a linear functional from T into the reals satisfying f(x) ≤ |x|, then f can be extended to all of S, with the same constraint f(x) ≤ |x|.

By zorn's lemma, let U be the largest subspace of S with f(U) extending f(T). Suppose U is not all of S, so that y is a point in S-U. For any two points u₁ and u₂ in U:

f(u₁) + f(u₂) = f(u₁+u₂) ≤
|u₁+u₂| = |u₁-y + u₂+y| ≤
|u₁-y| + |u₂+y|

Put this all together and get this.

f(u₁) + f(u₂) ≤ |u₁-y| + |u₂+y|

f(u₁) - |u₁-y| ≤ |u₂+y| - f(u₂)

View the left side as a function of U, and the right side as a function of U, and consider the respective images of U in R. The image of the left function is below, or at worst shares a boundary point with, the image of the right function. Choose a real number e between these images. If the two images share a boundary point, let e be this boundary point.

Set f(y) = e. By linearity, this defines f on the span of U and y.

Let's check our constraint. In the following, x is any point in U, and c is positive. Verify x+cy first, then x-cy.

f(cy+x) =
ce+f(x) =
c × (e + f(x/c)) ≤
c × (|x/c+y| - f(x/c) + f(x/c)) = { substituting for e }
c × |x/c+y| = |x+cy|

f(x+cy) ≤ |x+cy|

f(x-cy) =
-ce+f(x) =
c × (-e + f(x/c)) ≤
c × (|x/c-y| - f(x/c) + f(x/c)) = { substituting for -e }
c × |x/c-y| = |x-cy|

f(x-cy) ≤ |x-cy|

We have extended f to a subspace beyond U, and this contradicts the maximality of U. Therefore f extends to all of S, and f(x) ≤ |x| everywhere.

In the above, the selection of e might be forced, if the two images of U have a point in common, but more often there is a gap between the two images, providing some wiggle room. We can generalize this theorem by adding more requirements to f. This may close the gap, but it is still possible to select e, and extend f to all of S.

Let S, T, and f(T) be as above. In addition, assume there is an abelian monoid of bounded linear operators from S into itself. Don't worry about the word monoid; it just means operators can be composed in the usual way. Follow one operator with another and the resulting linear operator is bounded. If the first bound is k and the second is l, then distance is multiplied by at most k, and then at most l, thus a bound of kl. Since the monoid is abelian, two operators can be composed in either order and the result is the same. If they are implemented by matrices, for example, the matrices must commute. That's not typical, but there are sets of matrices that do commute, such as the symmetric matrices. Multiply any two symmetric matrices, in either order, and get the same result, which happens to be another symmetric matrix.

Assume these commuting operators are bounded by 1, i.e. they never expand distance in S. Since the composition of two such operators is still bounded by 1, we're all right.

The operators also map T into itself, and preserve the value of f. If a is one of our operators, write the constraints this way.

|a(x)| ≤ |x|
x ∈ T → a(x) ∈ T
x ∈ T → f(a(x)) = f(x)

There is an extension of f to all of S, with f(x) ≤ |x|, and f(a(x)) = f(x), for every x in S and all the operators a in our monoid.

Wow - that was just the statement of the theorem - now for the proof.

Consider all finite sets of operators taken from the monoid. Given a set of n operators, find the n images of x, add them up, take the norm in S, and divide by n. Let q(x) be the lower bound of this "average", across all finite sets of operators. Since norm is at least 0, the average is at least 0, and the greatest lower bound is well defined.

The monoid includes the identity map. Select this single operator, and the average norm is simply |x|. Therefore q(x) ≤ |x|.

We will see that q(x) has most of the properties of a norm. Since q is derived from the norm in S, q(cx) = |c|×q(x). Showing the triangular inequality requires more work.

Choose a finite set of n operators that keeps the average of x below q(x)+ε. Find a second set of m operators that keeps the average of y below q(y)+ε. Now build a set of m×n operators that is the cross product of these, an operator from the first set composed with an operator from the second. This is a finite set from our monoid, so compute the sum of each applied to x+y, then take the norm, then divide by mn.

The operators are linear, so replace each a_ib_j(x+y) with a_ib_j(x) + a_ib_j(y). Instead of taking the norm of the entire sum, take the norm of the first sum over x, then take the norm of the second sum over y, then add these norms together, then divide by mn. Thanks to the triangular inequality, this can only make things bigger.

The monoid is abelian, so we can apply the operators in either order. When the sum is applied to x, apply the n operators from the first set and add up the images of x. Call this intermediate result w. By assumption, |w|/n < q(x)+ε. Now each operator from the second set is applied to w, and the images are added. Once again, things only get bigger if we run w through each operator in turn, take norms, and add up those norms. Since the operators are bounded by 1, the norm is not increased by any operator in the second set. We can just skip that step. Thus the norm of the sum over x is below mn×(q(x)+ε).

Similar reasoning shows the norm of the sum over y is below mn×(q(y)+ε).

Add these together and divide by mn, and the result is below q(x)+q(y)+2ε. Let ε approach 0, and q(x+y) ≤ q(x)+q(y).

Watch what happens when x lies in T. Consider a finite set of n operators. Let w be the sum of the images of x under these operators. Since the operators do not change the value of f, f(x) = f(w)/n. We also know that f(w) ≤ |w|. Therefore f(x) ≤ |w|/n. This holds for all finite sets of operators from the monoid, hence f(x) ≤ q(x).

Go back up to the top of this section and apply the Hahn Banach theorem, using q(x) in place of the norm of x in S. You'll see that q has all the properties we need: the triangular inequality, scaling by positive constants, and f(x) ≤ q(x) on T. Therefore f extends to a linear function on S with f(x) ≤ q(x).

Since q(x) ≤ |x|, we have f(x) ≤ |x|. We only need show f is preserved by the operators in our monoid, as was the case for T.

Let a() be any operator, and x any element of S. Select the finite set of n operators 1, a, a², a³, … a^n-1. Apply these operators to x, add up the images, take the norm, and divide by n. This gives the "average", and it bounds q(x).

q(x) ≤ |x + a(x) + a²(x) + …| / n

This holds for any x, so apply the inequality to a(x)-x. The left side becomes q(a(x)-x). The right side telescopes down to two terms, and looks like this.

q(a(x)-x) ≤ |aⁿ(x) - x| / n

The right side only gets bigger if we replace the norm with |aⁿ(x)| + |x|.

Since a is bounded by 1, |aⁿ(x)| is no larger than |x|. This gives 2×|x|/n, which approaches 0 for large n. Therefore q(a(x)-x) ≤ 0.

Since f is bounded by q, f(a(x)-x) ≤ 0. The same is true of -x, and since all functions are linear, we can pull -1 out, giving -f(a(x)-x) ≤ 0. This means f(a(x)-x) ≥ 0. Combine these results and f(a(x)-x) = 0. This implies f(a(x)) = f(x), and that completes the proof.

A continuous linear map from one banach space onto another is bicontinuous.

The word onto is important here. Embed the x axis into the plane, and the open interval (0,1) maps to a set that is neither open nor closed in the plane. This is not a bicontinuous map.

If f is bicontinuous then an open ball centered at the origin maps to an open set containing the origin. This means the image encloses the origin, i.e. it contains a ball about the origin. Conversely, assume every open ball centered at the origin has an image that encloses the origin. The image of an open ball at x is, by linearity, f(x) plus the image of the same open ball at 0. Thus the image of the open ball at x includes an open ball about f(x). Apply this to every x in an open set O in the domain, choosing a ball about x that lies in O, and considering just the open ball about f(x). The image of O is a union of these open balls in the range and is open. This makes f bicontinuous. We only need prove the open ball property at 0.

Cover the domain with open balls centered at the origin having radius k, for all positive integers k. Let W be the image of the open unit ball in the range. Let kW be the image of the ball of radius k. Note that kW is in fact all the points of W multiplied by k, which sort of justifies my notation.

The images kW, for all k, cover the range. This because f maps onto the range.

If W′ is the closure of W, verify that k times W′ is the closure of kW. Since multiplication by k implements a homeomorphism, a point p is in an open set missing W iff kp is in an open set missing kW, hence k×W′ = (kW)′. Since there is no ambiguity, I'll just write kW′.

Suppose W is a nowhere dense set. Every open ball misses W, or contains a smaller open ball that misses W.

Given an integer k, consider kW, and an open ball. Contract everything by k, and the open ball pulls back to a smaller open ball nearer the origin. This open ball contains some open ball that misses W, and when we expand by k again we find an open ball that misses kW. Therefore kW is also a nowhere dense set.

The range is now the countable union of nowhere dense sets, and that makes it first category. However, a complete metric space is second category. This is a contradiction, hence W is not a nowhere dense set.

If W′ is the closure of W, then W′ contains an open ball of radius r, centered at c. If c is not in W then c is in the closure, and that means points of W approach c. Pick a point in W that is close to c, inside the ball of radius r, and relabel this point as c, the center of a new ball with a smaller radius r, that lies in W′. Now we know c is in W, and c has some preimage d in the ball of radius 1. When f is applied to the unit ball - d, a translate of the unit ball, the result is a translate of W that carries c to 0. Translates are homeomorphic, hence the closure of W-c contains a ball of radius r about 0. When subtracting d from the open ball of radius 1 in the domain, the result lies in the open ball of radius 2. Therefore 2W′ contains an open ball of radius r about the origin. Scale this by any real number, and the closure of the image of every open ball at 0 includes an open ball about 0.

For notational convenience, let W′ contain an open ball of radius r about 0. Thus kW′ contains an open ball of radius kr about 0.

Let y₀ be any point in the range with |y| < r. Thus y₀ is contained in a ball of radius r about 0, is contained in W′. We're going to build a series of vectors x_i in the domain, as i runs from 1 to infinity, summing to x₀, and their image y_i in the range will sum to y₀. This is where continuity finally comes into play; f(x₀) = y₀. Here we go.

Remember that points of W approach y₀. Let y₁ be a point in W that is within r/2 of y₀, and let x₁ be a preimage of y₁. Thus |x₁| < 1.

Let d₁ = y₀-y₁. The point d₁ is in ½W′. Since d₁ is in the closure of ½W, find y₂ within r/4 of d₁, and let f(x₂) = y₂. If you add y₂ to y₁, d₁ takes you back to y₀, and then there is the extra piece, the difference between d₁ and y₂, which is bounded by r/4.

Let d₂ = d₁-y₂. This is the extra piece I talked about. Since d₂ is in ¼W′, find y₃ within r/8 of d₂, and f(x₃) = y₃. Add y₃ to (y₁ + y₂) and retrace d₂ back to y₀, and then the extra piece bounded by r/8. This extra piece is the difference between d₂ and y₃.

Set d₃ = d₂-y₃, find y₄ and x₄, and so on. Continue this process, building a sequence x in the domain and y in the range. The x sequence approaches 0 geometrically. The partial sums of x form a cauchy sequence, with a limit that I will call x₀. Meanwhile the partial sums of the y sequence approaches y₀. Since f is continuous, f(x₀) = y₀.

How far is x₀ from the origin? Since x₁ is in the preimage of W, it has norm at most 1. Similarly, x₂ has norm at most ½, x₃ has norm at most ¼, and so on. Thus |x| < 2. This holds for all y with |y| < r, hence the open ball of radius r is contained in 2W. Scaling, the open ball of radius r/2 is contained in W. This proves f is bicontinuous.

If f is also injective it is a homeomorphism.

If the vector space S is complete with respect to two different norms f and g, and f(x) ≤ c×g(x) for some constant c, then the norms are equivalent. A ball of radius ε via f wholly contains a ball of radius ε/c via g. The identity map from g to f is injective, continuous, and onto, hence a homeomorphism by the above theorem. The inverse map is continuous, hence bounded. The bound gives a constant b satisfying g(x) ≤ b×f(x), and as per an earlier theorem, the norms are equivalent.

Let f be a linear map from one banach space S into another banach space T. Given any convergent sequence x_n in S, with f(x_n) a convergent sequence in T, assume f(x) = y, where x and y are the respective limits. Then f is continuous.

Define a new norm r on S as r(x) = sqrt(|x|² + |f(x)|²). This is basically the euclidean formula for distance in two dimensions, so it satisfies the properties of a norm.

Start with a cauchy sequence under the norm r, and select ε and n so that terms beyond x_n differ by no more than ε. Since distance under |S| or |T| is never larger than distance under r(), the terms beyond x_n differ by no more than ε under |S|, and their images differ by no more than ε under |T|. Both x_n and f(x_n) are cauchy, and converge to x and y. By assumption y = f(x), so x is the limit of the sequence under the metric r(). In other words, S is still a complete banach space.

Apply the identity map on S from |S| to r(S). The former metric is always bounded by the latter, so by the previous theorem, the map is a homeomorphism, continuous in both directions, and there is a reverse bound b satisfying r(x) ≤ b×|x|. Yet the metric of f(x) in T is bounded by r, so distance in T (via f) is bounded by b times distance in S, and f is continuous.

A topological vector space is a vector space with a topology, such that addition and scaling are continuous. A normed vector space is a topological vector space, deriving its topology from the metric. But there could be other topological vector spaces that are not metric spaces.

A topological vector space satisfies certain criteria, which will be presented below. As you might imagine, these criteria deal with open sets. There is no metric, no notion of distance, so that only leaves open and close sets and the operations of the vector space.

In a metric space, the translate of an open ball is an open ball, since the distance between two points does not change; but in a topological group, we have to use the properties of continuity to prove translation preserves open sets. If S is our topological space, S+S onto S is continuous. Let U be an open set and let V = U-x. Thus V is the preimage of U under translation. The preimage of U under addition is open, and is covered with base open sets. Remember that base open sets in the product are open sets cross open sets. For any y in V, x cross y is covered by an open set cross an open set. The open set that contains y has to lie in V, else its translate by x will not lie in U. Therefore V is covered by open sets and is open. Translation by x is continuous. Translation by -x is also continuous, hence translation is a homeomorphism on S.

This works even if S is a nonabelian group. Translation by x, on either side, implements a homeomorphism on S, and a subgroup of S with the subspace topology is homeomorphic to all its translates.

When S is a topological module, a nearly identical proof shows that scaling by the units of the base ring is a homeomorphism. Of course our ring is a field, and every nonzero real number is a unit, so scaling by a nonzero constant carries open sets to open sets. Scaling by 0 is continuous, but certainly not a homeomorphism.

Let S be a topological group with a local base at 0. If an open set O contains x, O-x encloses 0, and contains an open set U from our local base. Thus O contains U+x contains x, and the translate of the local base at 0 gives a local base at x. This holds for all x, hence the translates of the local base at 0 give a base for the topology.

What's wrong with the discrete topology, where every set is open? Nothing, if S is a group, but watch what happens when S is a vector space. Consider the multiples of x by the reals in [0,1]. This is the smear of x by a closed interval. The set is open of course, and by continuity its preimage is open. Let y be a real number in [0,1] and x cross y is in an open set cross an open set, where the open set containing y lies in [0,1]. So [0,1] is covered by open sets and is open; yet it is closed. The topology of S is restricted by the topology of R.

The following properties turn a vector space into a topological vector space. They deal with the local base at 0, which is sufficient to describe the entire topology. The variables U V and W represent base open sets at 0.

Every open set containing 0 contains some U.
If x is a point in an open set then there is some U with x+U in that open set. (Use x*U and V*x for a nonabelian group.)
There is some V with V+V in U.
For any point x and any U, there is a nonzero constant c with cx in U.
If 0 < |c| ≤ 1 then c×U lies in U and is a member of the base.

Condition 1 builds a local base at 0, and condition 2 moves that local base to any point x, thus building a base for the topology. Conversely, if S is a topological group, then the open sets containing 0 form a local base, and this or any other local base can be translated to any x.

Condition 3 implies, and is implied by, the continuity of S+S onto S. Assume the latter, and let U contain 0. Look at the preimage of S+S and find V+W containing 0,0 in U. The intersection of these base open sets is another open set containing 0, which contains a base open set. Relabel this as V, and V+V is in U.

Conversely, assume condition 3, and consider the preimage of x+y+U in a larger open set that contains x+y. Find V such that V+V lies in U. Now x+V + y+V lies in x+y+U, x cross y is contained in an open set, the preimage is open, and addition is continuous. A continuous abelian group is equivalent to 1 2 and 3.

Assume scaling by R is continuous, hence continuous at 0. The preimage of the base open set U is open. An open interval about 0, cross an open set about x, lies in U. This means a nonzero constant times x lies in U.

A local base need not satisfy condition 5, but it's possible to find a new local base that does. Start with W, a set in the local base. Continuity of scaling at 0 means an open interval (-e,e) times an open set V winds up in W. If e is 1 or greater, ratchet e down to any number below 1. Now c×V lies in W whenever |c| < e. Let U be the union of c×V for all c with |c| < e. Now U is an open subset of W. Replace W with U, and do the same for every other open set in the local base. Shrinking the open sets in a local base preserves the property of being a local base. Furthermore, each such set, multiplied by a constant below 1, produces a subset of the original. We are simply taking the union of fewer instances of c×V. Thus a topological vector space implies all 5 conditions.

If you don't want to mess with the axiom of choice then do this. Consider every pair (open interval cross base open set) wherein the product lies in W. For each such pair, reduce e down to a power of 2 below 1, such as ½, ¼, etc. Take the union of c*V for c in (-e,e) as we did above, then take the union across all pairs. The result, call it U, lives in W, and is the new base open set that takes the place of W. All base sets are replaced at once. Each is smaller than its predecessor, thus building a new base, and each satisfies condition 5.

Finally assume all 5 conditions hold. We already said 1 2 and 3 produce a topological group; we only need show scaling by R is continuous. Consider a base open set about cx, namely cx+U. Let V be a base open set such that V+V+V lies in U. Choose a real number d such that dx lies in V. If d > 1 let d = 1. (We can do this by property 5, and every e < d has ex in V.) Multiply the interval (c-d,c+d) by the open set x+V/c, or by x+V if |c| < 1. This is the union of cx + ex + (c+e)(v/c) for e in (-d,d). The result lies in cx+U, and contains cx, hence multiplication by R is continuous.

What goes wrong if S is discrete? We already said it can't be a topological vector space with scaling by R. The local base at 0 has to include the open set {0}, to cover the open set {0}. And this is all we need for the local base, and the base. Conditions 1 2 and 3 are satisfied, but 4 is not.

Review the separation axioms, and assume S is T₀. Select U so that x+U misses y. Thus U does not contain y-x. Remember that -U is an open set, hence y-U is an open set about y that misses x. If S is T₀ it is T₁.

A space need not be T₀. Within R², let vertical stripes, x = (-e,e), form a local base about 0. Verify that all 5 properties are satisfied. Yet points along the y axis cannot be separated.

If points are inseparable they can be clumped together to produce a quotient space. Does this disturb the structure of the group? If inseparable points become separated after translation, translate back and an open set contains one and not the other. Similar reasoning holds for scaling. Thus the quotient space is also a quotient module, in this case a vector space. In the above example the kernel is the y axis, and the quotient space is the x axis, which is indeed a topological vector space.

Henceforth we will assume inseparable points have been clumped together, and our topological vector space is T₁.

If x+U misses y, let V+V lie inside U, and suppose x+V and y-V intersect. Now x plus something in V yields y minus something in V, hence x plus something in U yields y, which is a contradiction. Therefore our topological vector space is T₂, or hausdorff.

In a metric space, a map is uniformly continuous if every ε has its δ. In a normed vector space, it is enough to look at balls about the origin, since the local base defines the base. Every ball of radius ε pulls back to a ball of radius δ. Of course this is required for continuity at 0, and we already showed that continuity at a point is the same as uniform continuity everywhere. All this can be generalized to topological vector spaces.

Let V be an open set about 0 in the range and pull back, by continuity, to U, an open set about 0 in the domain. Here V plays the role of ε and U plays the role of δ. Add x to the domain and x+U maps into f(x)+V. Given V, one U applies across the domain, and the function is uniformly continuous.

Every finite dimensional topological vector space S is homeomorphic to Rⁿ. Remember, we're assuming S is hausdorff.

A normed vector space is hausdorff, so every finite dimensional normed vector space is homeomorphic to Rⁿ. But we'll prove the more general assertion, regarding topological spaces.

Select a basis for S and build a linear map f from Rⁿ onto S. If f is bicontinuous, then the spaces are indeed homeomorphic.

Let U be an open set about 0 in S. Choose V so that V+V+V… n times lies in U. (U and V are part of the local base at 0, as described in the previous section.) Let x be the image of a coordinate unit vector of Rⁿ in S, and let cx lie in V. Thus the image of (-c,c) lies in V. Find such a constant for each dimension and build an open rectangular box in Rⁿ. This maps into V+V+V…, or U. Thus f is continuous at 0, and f is continuous everywhere.

Now for the converse. Let Q be the unit sphere in Rⁿ, and let B be the unit open ball in Rⁿ. The image of Q is a compact subspace of a hausdorff space, hence it is closed. The complement is open, so let U be a base open set about 0 that lies in the complement. Remember, our base open sets shrink when multiplied by a constant less than 1. Suppose y lies in U, where y = f(x), yet x is not in B. Scale x down to a unit vector, and y remains in U, even though it is now in the image of Q. This is a contradiction, hence all of U maps into B via f inverse. This holds for a ball of any radius, thus the inverse of f is continuous, and f is bicontinuous, and S is homeomorphic to Rⁿ.

We showed earlier that a finite dimensional subspace of a banach space is closed - how about a topological vector space? Let S be a topological vector space and let T be a finite dimensional subspace. Of course T looks just like Rⁿ.

For any x not in T, T and x span a subspace of dimension n+1. This also looks like euclidean space, hence x can be placed in an open set that misses T. This comes from an open set in S that misses T, thus x is not in the closure of T, and T is closed.

Every open set of S includes an n dimensional ball. As usual we'll prove this for a base set about 0, whence it applies to every open set by translation. Assume S supports at least n independent vectors. These combine to build euclidean space. Find a base set V such that V+V+V… n times lies in U. Scale each of the n unit vectors so that it lives in V. In other words, c_ib_i lives in V, where b_i builds the basis for n space. Add v to itself n times and all linear combinations of basis vectors, with coefficients from -c_i to c_i, live in U. This is a box, and of course it contains an open ball.

If S is locally compact then it is homeomorphic to Rⁿ. Prove S is finite dimensional and apply the earlier result.

Select an open set about 0 whose closure is compact. This is the definition of locally compact. Let U be a base set in this open set. Now the closure of U is closed in a compact set, hence it is also compact.

Let U contain the nonzero point x. Remember that c×U lies in U for |c| ≤ 1. Thus U contains the smear of x from -1 to 1. But suppose each c×U contains x. Thus U contains x/c, and U contains all the multiples of x, a line in our topological vector space. Being hausdorff, some V separates 0 from x. Let V be in the local base, so that V intersects the line of x in (-e,e)×x, where e < 1. The translates of V cover U closure. A finite subcover will do, yet a finite subcover cannot cover all the multiples of x. This is a contradiction, Therefore we can always scale U down to exclude any nonzero point x.

Let the translates of U/3 cover U closure. Note that we only need translate by elements of U. If p is in U closure then p-U/3 has to intersect U in some point Q, and Q+U/3 brings in p, so translations by elements of U will suffice.

Let x₁ x₂ x₃ … x_n define the translates of U/3 that form the finite subcover.

Let z be any point in U and assume, without loss of generality, that z is in the translate x₁+U/3. The difference z-x₁ lies in U/3, hence 3 times this difference still lies in U. Let 3(z-x₁) lie in some translate x₂+U/3. Now the difference between 3(z-x₁) and x₂ lies in U/3, so place 3 times this difference in x₃+U/3. Continue this process, and z = x₁ + x₂/3 + x₃/9 + x₄/27 … with the error term in U/3^k.

If you don't stumble upon equality early in the game, elements of x will repeat. That's ok; just group them together. So x₂/3 + x₂/243 becomes x₂ times 82/243, and so on. This is the compressed sum. Each of the n coefficients adds some of the terms of a geometric series, with powers of 3 in the denominator, hence each coefficient converges absolutely to a real number. Let c_i be this real number, the limit of the sum of the coefficients on x_i, and let W be the sum c₁x₁ + c₂x₂ + c₃x₃ + … c_nx_n.

How close is the k^th approximation to z, and to W? We already said the approximation is within U/3^k of z. At worst, the difference between W and the k^th approximation is a linear combination of x₁ through x_n, where the coefficients are bounded by the tail of a geometric series. These tails go to 0, and eventually the multiples of x_i are all in V, and the resulting linear combinations all lie in U. In other words, the error term lies in U. If the coefficients are cut in half, the result lies in ½u. As the coefficients approach 0, the error term fits into c×U, where c approaches 0. Put this all together and W-z lies in every multiple of U, no matter how small. However, if W-z is nonzero then some scale multiple of U excludes W-z. This is a contradiction, hence W = z.

Since z was arbitrary, x₁ through x_n span all of U.

Let T be the span of x₁ through x_n. Thus T contains U. Since T is finite dimensional it looks like Rⁿ. It is closed in S, and contains U closure. Suppose T does not contain some y in S. Let T′ be the span of T and y. Remember that T′ looks like Rⁿ⁺¹. U is an open set in S, hence open in T, and in T′. However, a set stuck in n dimensions cannot be an open set in n+1 dimensions. The interior of a square may be open in the plane, but it is not open in 3 space. Therefore T is all of S, and S is finite dimensional, and euclidean.

A hilbert space is a banach space with a dot product. The definition of dot product, given below, is consistent with the euclidean definition in Rⁿ.

The dot product is a binary operator whose operands are vectors in a banach space S. The result is a real number. If x and y are vectors in S, the dot product is indicated by a literal dot, as in x.y, hence the name dot product. This is also called an inner product.

The dot product respects scaling, so that c×(x.y) = cx.y = x.cy for any real constant c. Also, the dot product respects addition in either component. As a corollary, x.0 = x.y-x.y = 0, and similarly, 0.x = 0.

Symmetry is another requirement: x.y = y.x. (This does not hold for complex numbers; i'll get to that below.)

Finally, x.x = |x|², where |x| is the norm of x in the banach space S. The norm is tied to the dot product.

Notice that the traditional dot product in euclidean space satisfies all these properties, thus Rⁿ is a hilbert space.

A finite dimensional banach space is homeomorphic to Rⁿ, as demonstrated earlier. Hence a finite dimensional banach space can be viewed as a hilbert space by applying the euclidean norm and dot product.

Although we have skirted this topic thus far, a complex banach space is essentially a real banach space with some extra bells and whistles. It is a complex vector space with a norm that obeys the triangular inequality, thus a metric space. Addition is the same whether viewed as a complex space or a real space, hence addition is continuous. As with real scalars, the norm of c×x is the absolute value of c times the norm of x. Multiply x by anything on the unit circle and the norm does not change. Use this to show scaling by complex numbers is continuous, giving a continuous vector space. With this in mind, view S as a real vector space and it becomes a traditional banach space. The norm, only scaled by real numbers, is our old familiar norm once again.

The dot product changes slightly when S is viewed as a complex banach space. For starters, the dot product produces a complex number, not just a real number. A constant is conjugated when applied to the second operand, so that cx.y is the conjugate of x.cy. Also, x.y and y.x are conjugates of each other. This is consistent with the definition of dot product in n dimensional complex space, where the second vector is conjugated, then corresponding components are multiplied, and the pairwise products are added together. This seems like a trick, but it facilitates x.x = |x|². Concentrate on one component in an n dimensional vector, say a+bi. This is multiplied by a-bi, giving a² + b², which contributes to the norm of the vector just as it would in real space, where a and b are separate real components. The complex dot product in n dimensional complex space satisfies our properties, and Cⁿ is a complex hilbert space.

Let M be the space of continuous real valued functions on [0,1]. (You can use complex functions if you like; it's not much different.) This is a real vector space, and the norm |f| = sqrt(∫ f²) makes it a normed vector space. (If f is complex then replace f² with ff.) Actually we should stop and prove this is a norm. Because f is continuous, it cannot stray from 0 at a single point; it must leave 0 over a subinterval, giving a nonzero integral. Thus f = 0 iff |f| = 0. Scaling by a constant c multiplies the norm by |c|. Finally, |f+g| ≤ |f| + |g|, because the same is true of the riemann step functions that approach f and g. These step functions can be represented as vectors in n space when dividing [0,1] into n subintervals, and in that context the triangular inequality holds. It holds in the limit as n approaches in finity, giving the integrals that define the norms. The functions of M form a metric space.

Complete this metric space to build a banach space S. The completion includes every function that is approached by continuous functions. This includes piecewise continuous functions. Let f_n = 0 from 0 to ½-1/n, then slope up to 1 at x = ½+1/n, then remain at 1 across the rest of the unit interval. The limit is the discontinuous function that is 0 on [0,½), ½ at ½, and 1 on (½,1].

S may include functions that are not integrable. In fact the elements of S may not be functions at all. Tweak the above example, so that when n is odd the sloping line segment runs from ½-1/n,0 up to ½,1, and when n is even the segment runs from ½,0 up to ½+1/n,1. The limit function is 0 on [0,½) and 1 on (½,1], but is not defined at x = ½. Even more bizarre examples are possible. But as with any metric space, the completion consists of cauchy sequences, whether those sequences have a convenient representation or not.

You might wonder about the distance metric in S, where functions, and their integrals, are not well defined. Remember, distance in S ultimately comes from distance in M, which is always well defined. If f and g are functions in S, |f,g| is the limit of the distances between the functions that approach f and the functions that approach g. In any metric space, this limit exists, hence distance is well defined in S, and makes S a metric space. In fact S is a complete metric space, since the completion is always complete.

With |f| defined on M, and on S, S becomes a complete normed vector space, or a banach space. Let's turn it into a hilbert space.

Let f.g be the integral of f×g, or f×g if functions are complex. Since f×g is continuous, this is well defined, and when f = g, the result is the square of the norm. Verify the properties of linearity and symmetry, and ∫ f×g becomes a dot product for M.

Extend this to all of S. Let f_n and g_n be cauchy sequences in M, defining elements of S. By ignoring leading terms, we can assume that all the terms in f_n are within 1 of each other. ∫ (f_i-f_j)² < 1. This is the property of being cauchy, with ε set to 1. Similarly, assume the terms of g are within 1 of each other. If |f_n| exceeds |f₁|+1, then the distance from f₁ to f_n exceeds 1, which is a contradiction. Tied to their first terms, all the terms of f, and all the terms of g, have norms below some constant w.

For a small ε, smaller than 1, go out in both sequences so that functions beyond f_n are within ε of each other, and similarly for g. Consider pairs of functions f_i and f_j, and g_i and g_j. The difference between the two dot products is the integral of f_ig_i - f_jg_j. Can this be bounded by some constant times ε?

For notational convenience let u = f_i and let u+a = f_j. Let v = g_i and let v+b = g_j. (Remember that u, v, a, and b are continuous functions on [0,1].) The integrand becomes ub + va + ab. Consider the last term first. The integral of ab, squared, is no larger than the integral of a² times the integral of b². How do we know? Divide the unit interval into n subintervals and let riemann step functions approach the integrals. The left side is the sum of a_ib_i, squared, while the right side is the product of the sum of a_i² times the sum of b_i². The first is bounded by the second by the cauchy schwarz inequality. The same is true in the limit. Since a is the gap between f_i and f_j, a.a < ε². Similarly, b.b < ε². The square of a.b is less than ε⁴, hence a.b < ε².

Apply cauchy schwarz to the integral of ub, and bound it below the square root of the integral of u² times the integral of b². This is |u|×|b|, or wε. Similarly, v.a is bounded below wε. Put this all together and the difference between the dot products in positions i and j is bounded by (2w+1)ε. The sequence of dot products is cauchy, and converges to a real number. This is the dot product of the two sequences f_n and g_n in S.

Is this well defined? Let another sequence of continuous functions h_n represent the same element in S as g_n. In other words, their difference, e_n, converges to 0. Consider the limit of f_n.e_n as n approaches infinity. Each term is bounded below |f_n| × |e_n|, (again using the cauchy schwartz inequality), and |e_n| approaches 0, while |f_n| is bounded below w. The dot products f_n.g_n and f_n.h_n converge to the same real number, and dot product is well defined in S.

The properties of linearity and symmetry are straight forward, so consider the last property, the dot product of a sequence f_n with itself. Replace the terms with |f_n|². Since |f_n| approaches the norm of the entire sequence by definition, and the limit of the squares is the square of the limit, the sequence dotted with itself approaches the norm of the sequence, squared. That completes the proof.

In summary, the completion of the continuous functions on [0,1] is a banach space, and a hilbert space, using integration and limits to define the norm and the dot product. Complex functions on [0,1] build a complex hilbert space.

The dot product in finite dimensional real or complex space is a mathematical function using multiplication and addition, and is continuous - but how about a generic hilbert space S? Is the inner product continuous from S*S into R or C?

The dot product is tied to the norm, and the norm satisfies certain properties, such as the triangular inequality. This can be used to prove cauchy schwarz in an arbitrary hilbert space. Use the properties of norm and dot product to write the following.

0 ≤ |x-ly|² = (x-ly).(x-ly) =
x.x - 2l(x.y) + l²(y.y)

The inequality becomes equality iff x = ly.

Setting l = 0 or y = 0 gives 0 ≤ x.x, which isn't very interesting, so assume l > 0 and y ≠ 0, and write the following inequality.

2x.y ≤ x.x/l + ly.y

Set l = |x|/|y| and find x.y ≤ |x| × |y|. If x or y is 0 we have equality, and if x = ly we began with an equation that produces equality. This is cauchy schwarz. The dot product is bounded by the product of the norms, with equality iff one vector is a linear multiple of the other. Remember that 0 is technically a linear multiple of x, and sure enough, 0.x = |0| times |x|.

Now for continuity. Fix a vector v and consider the linear map from v.S into R. Concentrate on the unit sphere in S, the vectors in S with norm 1. v.x is bounded by |v|×|x|. Thus v.S is a bounded operator, hence continuous into R.

Let T be S cross S with the product topology. This is another banach space. The dot product now maps T into R. Select x and y from S such that the ordered pair x,y in T has norm 1. Thus |x| and |y| are at most 1 in S. The dot product x.y is bounded by 1. Once again a linear operator on a banach space is bounded, hence continuous. The dot product is a continuous linear map from S cross S into the reals.

Two nonzero vectors in a hilbert space S are orthogonal if their dot product is 0. Two vectors are orthonormal if they are orthogonal and have norm one. Orthogonal vectors can always be scaled to become orthonormal. These agree with the traditional definitions.

Let x and y be orthonormal and consider the distance from x to y. This is the square root of (x-y).(x-y). Expand, and replace x.y with 0, to get x.x+y.y, or 2. Orthonormal vectors are sqrt(2) distance apart.

A set of vectors (possibly infinite) forms an orthogonal system if every pair of vectors is orthogonal. The vectors in an orthonormal system are orthogonal, with norm 1.

Suppose a linear combination of orthogonal vectors yields 0. Take the dot product with any of these vectors and find that the coefficient on that vector has to be 0. This holds across the board, hence the vectors in an orthogonal system are linearly independent.

Suppose S is a separable hilbert space with an uncountable orthogonal system. Convert to an orthonormal system and find uncountably many points that are all sqrt(2) distance from each other. Place an open ball of radius ½ about each point. These are disjoint open sets, hence any dense set has to be uncountable, and S is not separable. A separable hilbert space can only support a countable orthogonal system.

If S is finite dimensional, use the gram schmidt process to convert any basis into an orthonormal system. In theory this works for a countable basis as well, but then S is not complete. Build a sequence by adding a scaled version of each coordinate in turn, the i^th coordinate scaled by 1/2ⁱ. We'll see below that this sequence is cauchy, but it cannot converge to any finite linear combination of these independent vectors. These orthogonal vectors do not form a basis for S, but they do form a hyperbasis, as we'll see below.

Assume S is separable, and let b₁ b₂ b₃ etc be a countable orthonormal system inside S. If v is any vector in S, let a_j = v.b_j. Let the n^th approximation of v be the partial sum over a_jb_j, as j runs from 1 to n. What is the distance from v to its n^th approximation? Answer this by looking at the square of the distance, which is v - the n^th approximation, dotted with itself. Let's illustrate with a₁ and a₂.

(v-a₁b₁-a₂b₂) . (v-a₁b₁-a₂b₂) =
v.v - 2×(v.a₁b₁+v.a₂b₂) + (a₁b₁+a₂b₂).(a₁b₁+a₂b₂) =
v.v - 2×(a₁²+a₂²) + (a₁b₁+a₂b₂).(a₁b₁+a₂b₂) =
v.v - 2×(a₁²+a₂²) + a₁²|b₁|² + a₂²|b₂|² =
v.v - 2×(a₁²+a₂²) + a₁² + a₂² =
v.v - (a₁²+a₂²)

This value is nonnegative for all n, thus giving Bessel's inequality:

∑ {1,∞} a_i² ≤ v.v

The squares of the coefficients of v are nonnegative numbers, and together they build a monotone series that is bounded by v.v. The terms approach 0, and the series converges absolutely to a real number between 0 and v.v.

The square of the norm of the difference between the i^th approximation and the j^th approximation is a section of the above series from a_i² to a_j². As we move out in the series, these slices approach 0. The approximations define a sequence that is cauchy, and since S is complete, the approximations approach an element that I will call u.

Remember that dot product is a continuous map from S cross S into R. If u is the limit of a cauchy sequence in S, then u.b_j is the limit of the cauchy sequence dotted with b_j. The approximations form our cauchy sequence, and when dotted with b_j, they produce 0 for a while, and then a_j thereafter. Therefore u.b_j = a_j. In other words, u and v produce the same coefficients. Every v generates a series of coefficients according to our orthonormal set b₁ b₂ b₃ etc, and these coefficients determine an element u in S.

If v ≠ u, the coefficients of v-u are all 0. Thus v-u is a new orthogonal vector, and the system is not maximal. If the orthonormal system is maximal, then every vector v is uniquely represented by its coefficients, such that the sequence of approximations approaches v. A maximal orthonormal system is also called complete, or total.

Apply zorn's lemma, and S contains a maximal orthonormal system, which acts as a hyperbasis for S. Note the difference in terminology; a basis spans using finite linear combinations, but a hyperbasis spans using infinite sums, i.e. infinitely many basis vectors could participate, as long as the squares of the coefficients sum to a real number. This is required by bessel's inequality, and if it holds, the approximations are cauchy, and the sequence converges to something in S. Therefore, the points of S are uniquely represented by square summable sequences, according to the designated hyperbasis.

If S is a finite dimensional R vector space, every orthonormal system is finite. A maximal system in n dimensional space contains n elements. It is a basis for the hilbert space.

If S is infinite dimensional, a finite orthonormal system will not span, hence a maximal orthonormal system is infinite. If S is separable, each such system is countable, hence the orthonormal basis can be designated b₁ b₂ b₃ etc, out to infinity. The finite dimensional hilbert space is equivalent to Rⁿ, and the separable infinite dimensional hilbert space is equivalent to L₂, the square summable sequences. This equivalence is more than a homeomorphism; the norm and dot product are also determined. There is but one hilbert space for each nonnegative integer, and one infinite, separable hilbert space, up to isomorphism.

We've demonstrated uniqueness, but what about existence? Rⁿ is a hilbert space, using the euclidean norm and dot product. Let's prove L₂ is a separable hilbert space.

Let S = L₂, the set of square summable sequences. If f is such a sequence, let |f| be the square root of the sum of its squares. This is 0 only when f is 0.

Scale a square summable sequence by c and find another square summable sequence. Furthermore, |cf| = |c|×|f|.

If f and g are two sequences, consider the first n terms of f+g. The triangular inequality is valid in n space. Thus the square root of the sum of the squares of the first n terms of f+g is no larger than the square root of the sum of the squares of the first n terms of f, plus the square root of the sum of the squares of the first n terms of g. This holds for all n, so take limits as n approaches infinity. Given any of these square summable sequences, what is the limit of the norm of the first n terms, as n approaches infinity? Since square root is a continuous function from R into R, applying square root to the partial sums of a convergent series is the same as applying square root to the limit. In other words, the limit of the norms of the partial sums is simply the norm of the entire sequence. Therefore, |f+g| ≤ |f| + |g|. This proves the triangular inequality, and it also proves f+g belongs to S, which was not obvious at the outset. Therefore S is a normed vector space.

To show S is complete, let u₁ u₂ u₃ etc be a cauchy sequence in S. Strip off leading terms, so that all terms are within 1 of each other. With this in mind, all the norms are founded, no more than 1 away from the norm of u₁. Project this sequence onto the j^th coordinate. In other words, look at the j^th term in u₁ u₂ u₃ etc. If u_n is the n^th row in an infinite matrix, we are moving down the j^th column. The distance between any two elements u_m and u_n is at least the difference in their j^th coordinates. Therefore, the projection onto the j^th coordinate defines a cauchy sequence in R. This converges to a real number that I will call c_j. This defines a new sequence c₁ c₂ c₃ etc, which is the bottom row of our infinite matrix.

Suppose c is not square summable. In other words, the sum over c_i² is unbounded. For some finite index j, the norm of the first j terms of c, a vector in j dimensional space, exceeds the norms of all the elements u_n in our cauchy sequence. Restrict to j dimensional space, the first j columns of our matrix, and within this space the norm of c is the limit of |u_n|, as n runs to infinity. This limit is a real number that cannot rise above the largest |u_n|, yet c lies above all these norms. This is a contradiction, hence c is square summable, and belongs to S.

Subtract c from each u_n. This does not change the distance between u_m and u_n, hence the resulting sequence is still cauchy in S. It converges to 0 iff the original sequence converges to c. We have a cauchy sequence u_n, and the components all converge to 0, and we want to show that u converges to 0.

A sequence approaches 0 iff its norms approach 0. Move down to u_m, so that u_n beyond u_m is never more than ε/3 away from u_m. Suppose |u_m| is at least 2ε/3. It's norm moves above ε/3 after finitely many terms. Once again we are in j dimensional space, where the cauchy sequence converges to 0. Some u_n is very close to 0, at least in j space, and the norm of u_m-u_n is above ε/3. This can only get worse as we go beyond j. This is a contradiction, hence |u_m| ≤ 2ε/3. Each u_n beyond this point has norm bounded by 2ε/3 + ε/3, or ε. The norms converge to 0, the sequence converges to 0, the original sequence converges to c, and S is complete. This makes S a banach space.

Let the dot product of f and g be the sum over f_ig_i. After n terms, the truncated dot product, squared, is bounded by the sum of the squares of the first n terms of f, times the sum of the squares of the first n terms of g. This is another application of cauchy schwarz. Taking square roots, the absolute value of the truncated dot product is bounded by the product of the two partial norms of f and g. This applies for all n, hence it applies in the limit. Therefore the dot product, as a sum of products, is absolutely convergent, and is well defined.

Note that all the properties are satisfied, including f.f = |f|². This makes S a hilbert space.

Finally, prove S is separable. Let a sequence of coefficients lie in the dense set D if finitely many are rational, and the rest are 0. This is a countable set. Let h be a base open set centered at v, with radius ε. Represent v as the square summable series a_j. Choose n so that the n^th approximation is within ½ε of v. Then move the first n coefficients to nearby rational values. These rational numbers can be arbitrarily close to the real coefficients of v. The result is a point in D that is within ε of v. The countable set D is dense, and S is separable.

As a hilbert space, L₂ has a countable dimension, corresponding to its complete orthogonal system. However, as a vector space, the dimension of L₂ is uncountable. I hinted at this earlier. Suppose L₂ has a countable basis, as an R vector space. Write the basis as an infinite matrix, where the i^th row holds the square summable sequence that is the i^th basis element. Use lower gaussian elimination to make the matrix upper triangular. Then use upper gaussian elimination to make the matrix diagonal. (If it doesn't come out diagonal then the matrix does not span, and is not a basis.) The span of this basis is now the direct sum of the coordinates, and the sequence 1, 1/2, 1/4, 1/8, 1/16, …, which is square summable, is not accessible.

Let S be a separable hilbert space with a complete orthonormal basis b₁ b₂ b₃ etc. Let f be a linear function from S into R. f is continuous iff f = u.S for some vector u. Every continuous function is really a dot product.

Let c_i = f(b_i). Let v_n be the sum of c_ib_i, as i runs from 1 to n. Use the linear properties of f to show f(v_n) = the sum of c_i², as i runs from 1 to n.

Remember that continuous is the same as bounded. Since f is bounded, let k be a bound on f. Now |f(v_n)| is no larger than k×|v_n|. Replace |v_n| with the square root of the sum of c_i². At the same time, f(v_n) is the sum of c_i². The norm of v_n is the square root of f(v_n).

f(v_n) ≤ k × sqrt(f(v_n))

sqrt(f(v_n)) ≤ k

f(v_n) ≤ k²

This applies in the limit, hence c_i forms a square summable sequence bounded by k². Apply this to the hyperbasis and find a point in S. Let u be the infinite sum of c_ib_i.

Let a_ib_i be the representation of an arbitrary element x in S. Since f is continuous, f(x) becomes the limit of f applied to the partial sums. This becomes the sum over a_ic_i. The same formula appears if you evaluate the dot product of x and u. Therefore f(S) = S.u.

To find the bound k, consider the points on the unit sphere in S. Let x have norm 1, and apply cauchy schwarz. The sum of squares of x.u is bounded by the sum of squares of x times the sum of squares of u. The norm of f(x) is bounded by |u|. Now f(u) = u.u = |u|², hence the bound on f is precisely |u|.

Suppose S.u, a continuous linear map, equals S.v for some other vector v. Apply u-v to S and get 0. In other words, u-v dot all of S yields 0. However, (u-v).(u-v) is nonzero, hence each linear function from S into R is S.u for a unique u. The bound on such a function is |u|, and is realized when u is dotted with itself.

If b is an orthonormal system in a possibly inseparable hilbert space S, then every vector v has countably many nonzero coefficients with respect to b, even though b might be uncountable.

Suppose v has uncountably many nonzero coefficients in its representation. Let ℵ₁ be the first uncountable ordinal. Assign an ordinal below ℵ₁ to each of these nonzero coefficients, so they appear in order.

Consider a finite sum y = a₁b₁ + a₂b₂ + a₃b₃, the start of our series. Let z = v - y. Now z.b₁ = a₁-a₁ = 0. By linearity, z.y = 0. Expand (y+z).(y+z) and get |y|² + |z|². This equals |v|². Thus a₁² + a₂² + a₃² ≤ |v|². The same reasoning holds for any finite collection of terms drawn from v.

Let d be a countable limit ordinal, and assume each set of coefficients bounded strictly below d is square summable with a cap of |v|². Arrange all the coefficients below d in a possibly different order, since they are countable, so they simply proceed 1 2 3 … to infinity. Each partial sum of squares is bounded by |v|², and so is the limit. This is an absolutely convergent series, and every subseries is bounded by the same limit. Each set of coefficients below d is square summable with a cap of |v|², including all the coefficients below d. Square summable extends to the limit ordinal.

Let y be a countable linear combination of a_ib_i, the start of our uncountable series, up to but not including an ordinal d. Assume the coefficients are square summable, thus y is well defined as a point in S. Using the continuity of the dot product, y.b_i is a series 0+0+0+…+a_i+0+0+…, giving a_i. Set z = v - y as before. Thus z.b_i = 0, z.y is an infinite sum of zeros, and z.y = 0. Expand (y+z).(y+z), and |y|² ≤ |v|². Put a_db_d at the start of the sum, and the same relationship holds. Square summable extends to the successor ordinal.

By transfinite induction, every countable partial sum of our uncountable series of coefficients is square summable with a cap of |v|².

Let d be an ordinal below ℵ₁. Let e_d be the sum of the squares of the coefficients below d. This sum is bounded by |v|². Since all coefficients are nonzero, each e_d is strictly larger than all the values of e that came before. Each e_d jumps over a new rational number. There are uncountably many rational numbers between 0 and |v|², and that is a contradiction. Therefore v only uses a countable subset of b.

Given v, restrict attention to those hyperbasis elements with v.b_i nonzero. As shown above, the coefficients a_i form a square summable sequence, the partial sums over a_ib_i are cauchy, the limit u exists in the complete metric space S, u and v produce the same coefficients, and v-u is orthogonal to each b_j. If b is a total orthonormal system, then v-u = 0, and v is faithfully represented by a countable set of nonzero coefficients applied to the hyperbasis b.

Now consider a continuous linear operator f from S into R, bounded by k. Let c_i = f(b_i), and since b comprises unit vectors, each c_i is bounded by k. Suppose uncountably many hyperbasis elements b_i have nonzero images in R. Associate these elements with the ordinals below ℵ₁. I'm going to rehash some material from the previous section. Let v be the finite sum over c_ib_i as i runs from 1 to n. Now f(v) is the sum of c_i² as i runs from 1 to n. |f(v)| is no larger than k×|v|. Replace |v| with the square root of the sum of c_i². At the same time, f(v) is the sum of c_i². The norm of v is the square root of f(v).

f(v) ≤ k × sqrt(f(v))

sqrt(f(v)) ≤ k

f(v) ≤ k²

∑ c_i² ≤ k²

Let d be a countable limit ordinal, and assume each set of hyperbasis elements bounded strictly below d has square summable images with a cap of k². Arrange all the elements below d in a possibly different order, since they are countable, so they simply proceed 1 2 3 … to infinity. Map these basis elements over to R and square the images. Each partial sum of squares is bounded by k², and so is the limit. This is an absolutely convergent series, and every subseries is bounded by the same limit. Each set of hyperbasis elements below d has square summable images with a cap of k², including all the elements below d. Square summable extends to the limit ordinal.

Consider a countable set b_i, the start of our uncountable series, up to but not including an ordinal d. Assume the images are square summable, thus y, the sum of c_ib_i, is well defined as a point in S. Think of y as the limit of a countable sequence v_n, using the finite linear combinations v as above. Remember that each f(v_n) is bounded by k². By the continuity of f, f(y) ≤ k². The sum of c_i² is bounded by k². Put b_d at the start of the series and rebuild y, and the same relationship holds. Square summable extends to the successor ordinal.

By transfinite induction, every countable partial sum of our uncountable series in R is square summable with a cap of k².

Let d be an ordinal below ℵ₁. Let e_d be the sum of the squares of the images below d. This sum is bounded by k². Since all images are nonzero, each e_d is strictly larger than all the values of e that came before. Each e_d jumps over a new rational number. There are uncountably many rational numbers between 0 and k², and that is a contradiction. Therefore f is nonzero on a countable subset of b, and is 0 elsewhere.

Concentrate on the countable subsequence b_i where c_i is nonzero. Let u be the infinite (or perhaps finite) sum over c_ib_i. The previous section applies. A linear bounded/continuous operator is equivalent to S.u, and the bound is |u|, realized by f(u).

If the cardinality of b is g, then we need at least g elements in any dense set. This is because the elements of b are all sqrt(2) units apart, and can be enclosed in disjoint open balls of radius ½. Conversely, we can produce a dense set with g elements by taking all finite linear combinations of b with rational coefficients. Each v is a countable linear combination drawn from b, and the finite linear combinations approach v. Find a finite partial sum within ½ε of v, then adjust the coefficients to nearby rational numbers, so that the adjustment doesn't stray by more than ½ε. The result is inside the open ball about v with radius ε.

Beyond Rⁿ, the dimension of S, as a hilbert space, is equal to the size of the smallest dense set in S. The latter is a function of the topology. Thus the dimension of a hilbert space is well defined. If you want to change the dimension, you have to change the topology of S, or find a new set altogether.

Two hilbert spaces are isomorphic iff they have the same dimension. The hilbert space of dimension g has an orthonormal basis of size g, and all countable linear combinations thereof such that the coefficients are square summable. To prove this construction is in fact a hilbert space, review the earlier theorem for L₂. Not much has changed, as long as you remember that the union of countable sets remains countable. The sum of two elements in S still draws on a countable subset of b. Even the infinite countable union of countable sets is countable. This is used to prove S is complete. The cauchy sequence is countable, and each element in this sequence uses a countable subset of b, hence the entire cauchy sequence lives in a separable hilbert subspace of S, and approaches its limit.

In summary, there is one hilbert space, up to isomorphism, for each nonzero cardinal, and the space is completely characterized as the countable linear combinations of basis elements with square summable coefficients.