A brief survey of linear algebra

{\LARGE\bf A brief survey of linear algebra}\newline \newline \newline

A brief survey of linear algebra

Tom Carter

http://cogs.csustan.edu/~tom/linear-algebra
Santa Fe Institute
Complex Systems Summer School

June, 2001

Our general topics: Top

Why linear algebra
Vector spaces (ex)
Examples of vector spaces (ex)
Subspaces (ex)
Linear dependence and independence (ex)
Span of a set of vectors (ex)
Basis for a vector space (ex)
Linear transformations (ex)
Morphisms - mono, epi, and iso (ex)
Linear operators (ex)
Normed linear spaces (ex)
Eigenvectors and eigenvalues (ex)
Change of basis (ex)
Trace and determinant (ex)
References

(ex): exercises.

Why linear algebra Top

Linear models of phenomena are pervasive throughout science. The techniques of linear algebra provide tools which are applicable in a wide variety of contexts.
Beyond that, linear algebra courses are often the transition from lower division mathematics courses such as calculus, probability/statistics, and elementary differential equations, which typically focus on specific problem solving techniques, to the more theoretical axiomatic and proof oriented upper division mathematics courses.
I am going to stay with a generally abstract, axiomatic presentation of the basics of linear algebra. (But I'll also try to provide some practical advice along the way ... :-)

Vector spaces Top

The first thing we need is a field F of coefficients for our vector space. The most frequently used fields are the real numbers R and the complex numbers C. A field F = (F, +, *, -, ^-1, 0, 1) is a mathematical object consisting of a set of elements (F), together with two binary operations (+, *), two unary operations (-, ^-1), and two distinguished elements 0 and 1 of F, which satisfy the fundamental properties:
1. F is closed under the four operations:
  
  +
  
  :
  
  Fx FŽ F
  *
  
  :
  
  Fx FŽ F
  -
  
  :
  
  FŽ F
  ^-1
  
  :
  
  F˘Ž F˘ F˘ = {a Î F| a š 0}
  
  Of course, we usually write a + b, a * b, -a, and a^-1 instead of +(a,b), *(a, b), -(a), and ^-1(a).
2. `+' and `*' are commutative and associative, and satisfy the distributive property. That is, for a, b, c Î F:
  
  a + b
  
  =
  
  b + a
  a * b
  
  =
  
  b * a
  (a + b) + c
  
  =
  
  a + (b + c)
  (a * b) * c
  
  =
  
  a * (b * c)
  a * (b + c)
  
  =
  
  a * b + a * c
3. 0 is the identity element and `-' is the inverse for addition. 1 is the identity element and `a^-1' is the inverse for multiplication. That is, for a Î F:
  
  a + 0
  
  =
  
  a
  a + (-a)
  
  =
  
  0
  a * 1
  
  =
  
  a
  a * (a^-1)
  
  =
  
  1 (a š 0)
4. Although we won't need it for most of linear algebra, I'll mention that R and C are both complete (Cauchy sequences have limits), and R is fully ordered (a < b or b < a or a = b for all a, b Î R).
5. As needed, we will identify R as a subfield of C, and we will typically write elements of C as a + bi, where a and b are real and i² = -1.
6. Although R and C are the most frequently used coefficient fields, there are many other fields such as Q, the rationals, and the finite fields Z/pZ for p a prime.

If we leave out the requirement that a * b = b * a, we get what are called skew fields. An important example of a skew field is H, the quaternions (also called the hamiltonians) which contains additional square roots of -1, i² = j² = k² = -1, and ij = k, jk = i, ki = j, but ij = -ji, jk = -kj, and ki = -ik. We typically write quaternions either in the form a + bi + cj + dk with a, b, c, d Î R, or
a+ bj with a, b Î C.

We are then ready for the definition. A vector space V = (V, F, +, -, *, 0) over a field F is a set V of elements (called vectors) together with a distinguished element 0, two binary operations, and a unary operation:

V x V Ž V

V Ž V

Fx V Ž V

For u, v, w Î V, and a, b Î F, these operations satisfy the properties:

v + w

w + v

(u + v) + w

u + (v + w)

v + 0

v + (-v)

1 * v

a * (u + v)

(a * u) + (a * v)

(a + b) * v

(a * v) + (b * v)

(a * b) * v

a * (b * v)

The elements of the field F are called the scalars, or coefficients of the vector space.

From the basic properties listed above, we can prove a variety of additional properties, such as:

0 * v

=

0
a * 0

=

0
(-1) * v

=

-v

We can also prove that the additive identity 0 is unique, as is the additive inverse -v.
We will usually simplify the notation, and write av instead of a * v. Furthermore, although it is important to distinguish the scalar 0 from the vector 0, we will typically write the vector in the simple form 0.
If F = R, we call V a real vector space, and typically write the scalars as a, b, c. If F = C, we call V a complex vector space, and often write the scalars as a, b, g.

Exercises: Vector spaces Top

Using the basic properties listed above, prove the additional properties:
1. 0 * v = 0
2. a * 0 = 0
3. (-1) * v = -v
4. -(-v) = v
5. For a Î F and v Î V, av = 0 iff a = 0 or v = 0.
Prove that the additive identity 0 is unique.
Prove that the additive inverse -v is unique.

Examples of vector spaces Top

The first main example of a real vector space is Rⁿ, the Cartesian product of n copies of the real line. An element of Rⁿ looks like (a₁, a₂, ź, a_n). When we add two vectors, we get

(a₁, a₂, ź, a_n) + (b₁, b₂, ź, b_n)

= (a₁+b₁, a₂+b₂, ź, a_n + b_n).

When we multiply by a scalar a Î R, we get

a * (a₁, a₂, ź, a_n) = (aa₁, aa₂, ź, aa_n).
Similary, we have the complex vector space Cⁿ, with vector addition and scalar multiplication defined the same way.
Note that we can also think of Cⁿ as a real vector space, if we restrict the scalars to R.
Let F^Ľ = {(a₀, a₁, ź)}, with
(a₀, a₁, ź) + (b₀, b₁, ź)
= (a₀ + b₀, a₁ + b₁, ź),
and a * (a₀, a₁, ź) = (a*a₀, a*a₁, ź).
This is a vector space over F.
For a scalar field F, we can define F[x] to be the set of all polynomials with formal variable x, and coefficients in F. This is a vector space over F, the coefficient field. Each polynomial is a vector. Vector addition is just polynomial adddition, and scalar multiplication just multiplies each coefficient in the polynomial by the scalar:

a * (a₀ + a₁x¹ + źa_nxⁿ)

= aa₀ + aa₁x¹ + źaa_nxⁿ.
For a scalar field F, let Fⁿ[x] be the set of all polynomials of degree Ł n with coefficients in F. We can also let F^Ľ[x] be the vector space of (formal) power series over F. These are also vector spaces over F.
Let C⁰(R) be the set of all continuous functions with domain and range the real numbers. That is:
C⁰(R) = {f:RŽ R | f is continuous}.
We can define an addition operation on this set. To specify the sum of two functions, we must specify the sum at each point in the domain. If f, g Î C⁰(R), we define f + g for each x by

(f + g)(x) = f(x) + g(x).

We define scalar multiplication pointwise also: (a*f)(x) = a*(f(x)).
C⁰(R) thus becomes a real vector space, where each continuous function is a vector in the space.
Let Cⁿ(R) be the set of all continuous real functions whose first n derivatives are also continuous, and define addition and scalar multiplication pointwise. Similarly, we can define C^Ľ(R) as functions with all derivatives continuous. Again we get real vector spaces.
Let C⁰[0,1] be the set of all continuous functions with domain the closed interval [0,1], and range R. We can also define Cⁿ[0,1] to be those functions with first n derivatives continuous, and C^Ľ[0,1] with all derivatives continuous. We can also generalize to subsets of R other than the interval [0,1]. With pointwise addition and scalar multiplication, these are each real vector spaces.
We get can get similar complex vector spaces if we use C instead of R in examples like those above.

Exercises: Examples of vector spaces Top

Show that each of the examples listed in this section is a vector space.
Consider the set of points in R² of the forms (x, 0), x Î R, and (0, y), y Î R (i.e., the union of the X-axis and the Y-axis). Show that with the usual vector addition in R², this is not a vector space over R.
Consider the set R² with usual vector addition:

(x₁, y₁) + (x₂, y₂) = (x₁ + x₂, y₁ + y₂),

but with ``scalar multiplication'' given by

a * (x, y) = (ax, y)

for a Î R. Show that this is not a vector space over R.

Subspaces Top

If V is a vector space over F, a subset U Ě V is called a vector subspace of V if U is a vector space over F using the same vector addition and scalar multiplication as in V. Often, in context, we will just call U a subspace of V.
Most often, given a subset U Ě V, we will just let U inherit the vector addition and scalar multiplication operations from V. Hence, most of the vector space properties will come for free. What may not come for free, however, is whether the subset U is closed under the operations.
Thus, when we inherit the operations from V, we will have that

+ : U x U Ž V,

and

* : Fx U Ž V,

whereas, what we need is

+ : U x U Ž U,

and

* : Fx U Ž U.

In order for U to be a vector subspace, we must be sure that doing the operations will leave us in the subset U.
Examples of subspaces:
1. {0} is a subspace of V for any vector space V.
2. If m Ł n, then F^m (identified with {(a₁, a₂, ź, a_m, 0, ź, 0)}) is a subspace of Fⁿ.
3. If m Ł n, then F^m[x] is a subspace of Fⁿ[x], and a subspace of F[x].
4. If m Ł n, then Cⁿ(R) and C^Ľ(R) are subspaces of C^m(R) .
5. In R², let U = {(x, y) | y = 3x}. Then U is a subspace.
6. More generally, in Rⁿ, fix a₁, a₂, ź, a_n, and let U = {(x₁, x₂, ź, x_n) | a₁x₁ + a₂x₂ + ź+ a_nx_n = 0}.
  Then U is a subspace.
7. If U₁ and U₂ are subspaces, so is U₁ ÇU₂.
Examples which are not subspaces:
1. In R², let U = {(x, y) | x = 0 or y = 0} (i.e., U is the union of the x-axis and the y-axis). Then U is not a subspace, since the two vectors (1,0) and (0,1) are in U, but (1,0) + (0,1) = (1,1) is not in U.
2. In R², let U = {(x, y) | x and y are both rational numbers}. Then U is not a subspace, since (1, 1) is an element of U, but Ö2*(1, 1) = (Ö2, Ö2) is not an element of U.
3. In Rⁿ, fix a₁, a₂, ź, a_n, and let U = {(x₁, x₂, ź, x_n) | a₁x₁ + a₂x₂ + ź+ a_nx_n = 2}. Then U is not a subspace, since, for example, 0 is not an element of U.
4. In general, if U₁ and U₂ are subspaces, U₁ ČU₂ is not a subspace (except in special cases, such as when U₁ Ě U₂).
If U₁ and U₂ are both subspaces of a vector space V, we can define the subspace which is the sum of the two subspaces by:

U₁ + U₂ = {u₁ + u₂ | u₁ Î U₁ and u₂ Î U₂}.
If in addition U₁ ÇU₂ = {0}, then each element u Î U₁ + U₂ can be written in a unique way as u = u₁ + u₂. In this case, we call the sum of U₁ and U₂ a direct sum, and write it U₁ ĹU₂.
These ideas generalize in a straightforward way to U₁+ź+ U_n and U₁ĹźĹU_n for a finite number of subspaces, and U₁+U₂+ź and U₁ĹU₂Ĺź for countably many subspaces.

Exercises: Subspaces Top

Show that each of the examples identified in this section as a subspace actually is a vector subspace.
Which of the following are vector subspaces of R³?
1. {(x₁, x₂, x₃) Î R³ | 3x₁ + 2x₂ - x₃ = 0}
2. {(x₁, x₂, x₃) Î R³ | 3x₁ + 2x₂ - x₃ = 4}
3. {(x₁, x₂, x₃) Î R³ | x₁x₂x₃ = 0}
Suppose that U is a vector subspace of V, and V is a vector subspace of W. Show that U is a vector subspace of W.
Show that the intersection of any collection of vector subspaces of V is a vector subspace of V.
Let

l² = {(a_i) Î R^Ľ | Ľ
ĺ
i = 0
|a_i|² < Ľ}.

Show that l² is a vector subspace of R^Ľ.
Let

L² = {f Î C⁰(R) | ó
ő Ľ

-Ľ
|f(x)|² dx < Ľ}.

Show that L² is a vector subspace of C⁰(R).

Linear dependence and independence Top

From here on, we'll assume that U and V are vector spaces over a field F.

Suppose that v₁, v₂, ź, v_n are vectors in V, and that for some a₂, a₃, ź, a_n Î F,

v₁ = a₂v₂ + a₃v₃ + ź+ a_nv_n.

Then we say that v₁ is linearly dependent on v₂, ź, v_n. Note that if we move v₁ to the other side, and let a₁ = -1, we have a₁v₁ + a₂v₂ + ź+ a_nv_n = 0, and not all of the a_i are zero (in particular, a₁ š 0).
This motivates a general definition: a set S of vectors in a vector space V is called linearly dependent if, for some n > 0, and distinct v₁, v₂, źv_n Î S, there exist a₁, a₂, ź, a_n Î F, not all 0, with

a₁v₁ + a₂v₂ + ź+ a_nv_n = 0.
The flip side of this definition is the following: a set S of vectors in a vector space V is called linearly independent if, given distinct v₁, v₂, źv_n Î S, the only way for a₁, a₂, ź, a_n Î F to give

a₁v₁ + a₂v₂ + ź+ a_nv_n = 0

is if every one of the a_i are zero.
A useful way to think about this is that a set S of vectors is linearly independent if no individual one of the vectors is linearly dependent on a finite number of the rest of them.
Some examples:
1. In R², the sets of vectors {(1,0),(0,1)}, {(1,2),(1,3)}, {(1,0)} and {(2,2),(3,2)} are each linearly independent sets.
2. In R², none of the sets of vectors {(1,0),(2,0)}, {(1,0),(0,1), (1,1)}, {(0,0),(0,1)}, {(0,0)}, {(1,2),(2,3),(3,4),(4,1)}, nor {(1,2),(3,4),(2,1)} is a linearly independent set.
3. In any vector space, if a set of vectors contains the 0 vector, the set is not linearly independent.
4. In C⁰(R), if f₁(1) = f₂(2) = ź = f_n(n) = 1, but f_i(j) = 0 for i š j, then the set of functions {f₁, f₂, ź, f_n} is linearly independent.
5. In C^Ľ(R), the set of functions {cos(nx) | n = 0, 1, 2, ź} is a linearly independent set.
6. The empty set, {}, is a linearly independent set.
Mathematics is an extremely precise language. These two sentences do not mean the same thing:
1. The set {v₁, v₂, ź, v_n} of vectors is linearly independent.
2. {v₁, v₂, ź, v_n} is a set of linearly independent vectors.
One of the hardest parts of doing mathematics is developing your mathematical intuition. It is tempting to imagine that intuition is what you have before you know anything, but that is nonsense. Intuition is just the automatic part of your knowledge, derived from your past experience. Becoming better at mathematics involves learning new mathematics, and then integrating that new knowledge into your intuition. Doing that takes care, precision, and lots of practice!

Exercises: Linear dependence and independence Top

Verify the statements in each of the six examples in this section.
Suppose that P₀, P₁, ź, P_n are polynomials in Fⁿ[x], and P_i(1) = 0 for i = 0, 1, ź,n. Show that {P₀ , P₁, ź, P_n} is not linearly independent in Fⁿ[x].
In C^Ľ(R), show that each of these sets of functions is linearly independent:
1. {xⁿ | n = 0, 1, 2, ź}.
2. {sin(nx) | n = 0, 1, 2, ź}.
3. {e^nx | n = 0, 1, 2, ź}.

Span of a set of vectors Top

If S is a set of vectors in the vector space V over F, we define the span of the set S by:

span(S) = {a₁v₁ + a₂v₂ + ź+ a_nv_n |

a_i Î F, v_i Î S, n > 0}.

This says that span(S) is the set of all vectors that can be written as finite linear combinations of the vectors in S.
Examples:
1. In R¹, span({(1)}) is all of R¹.
2. In R³, span({(1,0,0), (0,1,0)}) is the x-y plane. Similarly, span({(1,0,0), (0,0,1)}) is the x-z plane, span({(0,1,0), (0,0,1)}) is the y-z plane, and span({(1,0,0), (0,1,0), (0,0,1)}) is all of R³.
3. Still in R³, span({(1,2,0), (2,1,0)}) is the x-y plane, span({(1,0,0)}) is the x axis, span({(1,0,0), (1,1,0), (1,1,1)}) is all of R³, span({(1,2,3), (1,1,0), (1,1,1), (2,1,4)}) is all of R³, and span({(1,2,0), (1,1,0), (3,1,0), (2,1,0)}) is the x-y plane.
4. In Fⁿ[x], span({1, x¹, x², ź, xⁿ}) = Fⁿ[x].
5. In F[x], span({1, x¹, x², ź}) = F[x].
6. In C⁰(R), span({f₁(x) = 1, f₂(x) = x}) is the set of all linear functions, y = mx + b.
7. For any set S, span(S) is a subspace.
8. In particular, span(S) is sometimes defined to be the smallest subspace of V containing S, or the intersection of all subspaces of V that contain S.

Exercises: Span of a set of vectors Top

Verify the statements in each of the eight examples in this section.
Show that if span({v₀, v₁, ź, v_n}) = V, then span({v₀ - v₁, v₁ - v₂, ź,v_n-1 - v_n, v_n}) = V.

Basis for a vector space Top

Suppose S is an ordered set (or list) of vectors in a vector space V. Suppose further that:
1. S is linearly independent, and
2. span(S) = V.
Then we call S a basis for the vector space V.
Given a basis S for the vector space V, every vector v Î V, v š 0, can be written in a unique way as:

v = a₁v₁ + a₂v₂ + ź+ a_nv_n

with a₁, a₂, ź, a_n Î F, a_i š 0, and v₁, v₂, ź, v_n Î S. For uniqueness, we take the v_i to be distinct, and to be in the order they appear in S. We are writing each v as a finite linear combination of the basis vectors.
Examples:
1. In F³, the ordered set S = ((1,0,0),(0,1,0),(0,0,1)) is a basis (often called the standard basis). This generalizes in the obvious way to Fⁿ.
2. In F³, the ordered set S = ((1,0,0),(1,1,0),(1,1,1)) is a basis. So is ((1,2,3),(2,1,3),(1,2,2)).
3. In general, in Fⁿ any linearly independent ordered set S = (v₁, v₂, ź, v_n) of size n is a basis.
4. (1, x, x², x³, ź) is a basis for F[x].
5. If S is a linearly independent ordered set, then S is a basis for span(S).
If a vector space V has a finite basis S = (v₁, v₂, ź, v_n), we say that V is finite dimensional, and we define the dimension of V by:

dim(V) = n.

We define dim({0}) = 0.
Note that if a vector space V has a finite basis of size n, then every basis for V contains n vectors, and thus the definition makes sense.
For example, dim(Fⁿ) = n for any field F and n > 0.
We also have that dim(Fⁿ[x]) = n + 1.
If a vector space V has a finite or countably infinite basis S, then we can uniquely represent each vector v with respect to S by a list of the form:
v = (b₁, b₂, ź, b_n) in the finite case, or
v = (b₁, b₂, ź) in the infinite case.
Each b_i is either the non-zero coefficient corresponding with the ith element of S from the unique representation described above, or 0 if the basis element does not appear there. In the infinite case, only finitely many of the b_i are non-zero.
We represent 0 by (0, 0, ź, 0) in the finite case, and by (0, 0, ź) in the infinite case.
In these cases, we also have, for v_i Î S, that

V = span(v₁)ĹźĹspan(v_n)

in the finite case, and

V = span(v₁)Ĺspan(v₂)Ĺź

in the countably infinite case.

Exercises: Basis for a vector space Top

Verify the statements in each of the five examples in this section.
Let U be the subspace of R⁶ given by U = {(x₁, x₂, x₃, x₄, x₅, x₆) Î R⁶ | 2x₁ + 3x₃ = 0 and x₂ = 4x₅}.
Find a basis for U.
Show that if U₁ and U₂ are subspaces of a finite dimensional vector space, then dim(U₁ + U₂) = dim(U₁) + dim(U₂) - dim(U₁ ÇU₂).

Linear transformations Top

If U and V are vector spaces over F, then a function T : U Ž V is called a linear transformation or linear mapping if

T(a₁u₁ + a₂u₂) = a₁T(u₁) + a₂T(u₂)

for all a₁, a₂ Î F and u₁, u₂ Î U.
An equivalent pair of conditions is that T(u₁ + u₂) = T(u₁) + T(u₂), and T(au) = aT(u).
For a linear transformation T : U Ž V, we call U the domain and V the codomain (or sometimes range) of T. We also define the kernel (or null space) of T by:

ker(T) = {u Î U | T(u) = 0}.

Further, we define the image (or sometimes range - be careful here!) of T by:

im(T) = {v Î V | v = T(u) for some u Î U}.

For a linear transformation T : U Ž V we have the nice properties:

ker(T) is a subspace of U, and im(T) is a subspace of V. (ex)
If U is finite dimensional, then

dim(U) = dim(ker(T)) + dim(im(T)).

(ex)
If U is finite dimensional with basis (u₁, u₂, ź, u_n), and V is finite dimensional with basis (v₁, v₂, ź, v_m), then T is determined by its effect on the basis elements u_i. There exist a_ij, 1 Ł i Ł m, 1 Ł j Ł n with:

T(u_j) = a_1jv₁ + a_2jv₂ + ź+ a_mjv_m,

and in general, if u = ĺ_jb_ju_j, then

T(u) = m
ĺ
i = 1
n
ĺ
j = 1
a_ijb_jv_i.
Thus, given particular bases for U and V, we can represent the linear transformation T by the matrix

[T] = é
ę
ę
ę
ę
ë

a₁₁
a₁₂
ź
a_1n

a₂₁
a₂₂
ź
a_2n

:
:
:
:

a_m1
a_m2
ź
a_mn
ů
ú
ú
ú
ú
ű

or [T]_ij = a_ij.

If we represent the vector u Î U by the column matrix [b₁, b₂, ź, b_n]^t, then we have

[T(u)]

[T][u]

é
ę
ę
ę
ę
ë

a₁₁

a₁₂

a_1n

a₂₁

a₂₂

a_2n

a_m1

a_m2

a_mn

ů
ú
ú
ú
ú
ű

é
ę
ę
ę
ę
ë

b₁

b₂

b_n

ů
ú
ú
ú
ú
ű

é
ę
ę
ę
ę
ë

a₁₁b₁ + a₁₂b₂ + ź+ a_1nb_n

a₂₁b₁ + a₂₂b₂ + ź+ a_2nb_n

a_m1b₁ + a_m2b₂ + ź+ a_mnb_n

ů
ú
ú
ú
ú
ű

Sometimes it is more convenient to use the Einstein summation conventions, where we would write:

T(u)_i = T_i^ju_j

with implied summation over the repeated upper and lower indices.
If T₁ : U Ž V and T₂ : U Ž V are two linear transformations, we can define the sum of the two transformations as

(T₁ + T₂)(u) = T₁(u) + T₂(u),

and we can define scalar multiplication of a linear transformation T by a as

(a * T)(u) = a * (T(u)).

We can thus define L(U, V) to be the space of all linear transformations from U to V. We define the zero transformation 0 : U Ž V by 0(u) = 0. With these definitions, L(U, V) is a vector space. (ex)
Given particular bases for U and V, the matrix representation of T₁ + T₂ is given by: [T₁ + T₂] = [T₁] + [T₂].
If U and V are finite dimensional, then L(U, V) is also finite dimensional, and

dim(L(U, V)) = dim(U)*dim(V).

(ex)
If T : U Ž V and S : V Ž W are linear transformations, then the composition S °T : U Ž W is also a linear transformation. (ex)
Recall that (S °T)(u) = S(T(u)).
Given particular bases for U, V, and W, the matrix representation of S °T is the matrix product [S °T] = [S][T]. We usually abbreviate S °T as ST.
It is worth noting that unless W Í U, it doesn't even make sense to talk about T °S.

The matrix of the composition of two linear transformations is given by the product of the two matrices, given by:

[ST] = [S][T] =

é
ę
ę
ę
ę
ë

a₁₁

a₁₂

a_1n

a₂₁

a₂₂

a_2n

a_m1

a_m2

a_mn

ů
ú
ú
ú
ú
ű

é
ę
ę
ę
ę
ë

b₁₁

b₁₂

b_1p

b₂₁

b₂₂

b_2p

b_n1

b_n2

b_np

ů
ú
ú
ú
ú
ű

é
ę
ę
ę
ę
ę
ę
ę
ę
ę
ę
ë

n
ĺ
i = 1

a_1ib_i1

n
ĺ
i = 1

a_1ib_i2

n
ĺ
i = 1

a_1ib_ip

n
ĺ
i = 1

a_2ib_i1

n
ĺ
i = 1

a_2ib_i2

n
ĺ
i = 1

a_2ib_ip

n
ĺ
i = 1

a_mib_i1

n
ĺ
i = 1

a_mib_i2

n
ĺ
i = 1

a_mib_ip

ů
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ű

We also have the distributive property:

S(T₁ + T₂) = (ST₁) + (ST₂).

The matrix representation for this is:

[S][T₁ + T₂] = [S][T₁] + [S][T₂].

Some examples of linear transformations:
1. The function T : R² Ž R given by T((x, y)) = x + y is linear.
2. The function T : R² Ž R³ given by T((x, y)) = (x + 2y, x - 2y, x - y) is linear.
3. Given a₁, a₂, ź, a_n Î F, and V a finite dimensional vector space over F with basis (v₁, v₂, ź, v_n), the function T : V Ž F given by T(b₁v₁ + ź+ b_nv_n) = a₁b₁ + ź+ a_nb_n is linear.
  In general, for a vector space V, a linear transformation T : V Ž F is called a linear functional. The study of these transformations is called functional analysis.
4. The function D : Fⁿ[x] Ž F^n-1[x] given by
  D(a_nxⁿ + ź+ a₁x + a₀)
  = na_nx^n-1 + ź+ 2a₂x + a₁
  is linear. This linear transformation is called the derivative ...
5. Similarly, we have the derivative transformation D: F^Ľ[x] Ž F^Ľ[x] given by
  
  D( Ľ
  ĺ
  n = 0
  a_nxⁿ) = Ľ
  ĺ
  n = 1
  n*a_nx^n-1.
6. The shift map S : F^Ľ Ž F^Ľ is given by: S((a₀, a₁, ź)) = (a₁, a₂, ź).
7. The difference map D: F^Ľ Ž F^Ľ is given by:
  
  D((a₀, a₁, ź)) = (a₁ - a₀, a₂ - a₁, ź),
  
  or, abbreviating (a₀, a₁, ź) by (a_n),
  
  D((a_n)) = (a_n+1 - a_n).

Exercises: Linear transformations Top

Verify each of the statements marked (ex) in this section.
Verify that each of the 7 examples actually are linear transformations.
Show that the usual calculus derivative [d / dx] : C^Ľ(R) Ž C^Ľ(R) given by:

d f(x)
dx
=
lim
hŽ 0
f(x+h) - f(x)
h

is a linear transformation.
Is the usual calculus indefinite integral

ó
ő f(x)dx

a linear transformation? Why or why not? What about the definite integral?

Morphisms - mono, epi, and iso Top

In algebra, we call a function that preserves structure a morphism. In our current context, a linear transformation preserves the linear structure of a vector space in the sense that T(u₁ + u₂) = T(u₁) + T(u₂) and T(au) = aT(u). Thus, linear transformations are morphisms in the category of vector spaces.
We will be particularly interested in morphisms that have additional properties. Specifically, we are likely to look for morphisms T : U Ž V which have one or more of the properties:
1. One-to-one. Such transformations are also called injective, or monomorphisms. These transformations have the property that if u₁ š u₂, then T(u₁) š T(u₂). This says that different things get sent different places - that is, no two things get sent to the same place.
  Monomorphisms are nice because the subspace im(T) Ě V looks just like U.
2. Onto. Such transformations are also called surjective, or epimorphisms. These transformations have the property that for every v Î V, there exists some u Î U with T(u) = v. This says that every element of V is hit by some element of U. Another way to say this is that im(T) = V.
  Epimorphisms are nice, because the algebraic properties of V will be reflected back in U.
3. Both one-to-one and onto. Such transformations are also called bijective. A bijective function f : X Ž Y has an inverse f^-1 : Y Ž X with the properties that for all x Î X and all y Î Y, f^-1(f(x)) = x and f(f^-1(y)) = y.
  A bijective morphism whose inverse also preserves algebraic structure is called an isomorphism. In linear algebra, we have the nice property that if a linear transformation is bijective, then its inverse is also linear, and thus it is an isomorphism.
4. If there is an isomorphism T : U Ž V, we say that the spaces U and V are isomorphic. If two spaces are isomorphic, then they share all relevant properties. Thus, two isomorphic vector spaces are indistinguishable as vector spaces except for a renaming of the elements.
We have the nice fact that a linear transformation T : U Ž V is a monomorphism (is one-to-one) if and only if ker(T) = {0}.
Pf.: Suppose T is a monomorphism. We know that for every linear transformation, T(0) = 0. Then, since T is a monomorphism, we know that if T(u) = 0 = T(0), it must be that u = 0. Thus ker(T) = {0}.
On the other hand, suppose that ker(T) = {0}. Then, if T(u₁) = T(u₂), we will have 0 = T(u₁) - T(u₂) = T(u₁ - u₂). But this means that u₁ - u₂ Î ker(T) and hence, since we are assuming that ker(T) = {0}, we must have u₁ - u₂ = 0, or u₁ = u₂. By the contrapositive, this means that if u₁ š u₂, then T(u₁) š T(u₂). Q.E.D.
(I had to do at least one proof,
didn't I? :-)
An important example of an isomorphism is the identity transformation I_U : U Ž U given by I_U(u) = u. (ex)
If T : U Ž V is a monomorphism, then there is a linear transformation S₁ : im(T) Ž U with S₁(T(u)) = (S₁T)(u) = u for all u Î U. This says that S₁T = I_U. S₁ is called a left (partial) inverse of T. (ex)
If T : U Ž V is an epimorphism, then there is a linear transformation S₂ : V Ž U with T(S₂(v)) = (TS₂)(v) = v for all v Î V. This says that TS₂ = I_V. S₂ is called a right (partial) inverse of T. (ex)
If T : U Ž V is an isomorphism, then since T is an epimorphism, both S₁ and S₂ (as above) exist. Also, im(T) = V. We therefore have that for all v Î V,

S₁(v) = S₁((TS₂)(v)) = (S₁T)(S₂(v)) = S₂(v).

Thus S₁ = S₂. In this case, the (two-sided) inverse of T exists, and we have T^-1 = S₁ = S₂.
If U and V are finite dimensional vector spaces over F with dim(U) = dim(V), then U and V are isomorphic. (ex)
(Big) hint for proof: Let (u₁, u₂, ź, u_n) and (v₁, v₂, ź, v_n) be bases for U and V respectively. Define T : U Ž V by T(u_i) = v_i for 1 Ł i Ł n, and extend by linearity. Make sense of the phrase ``extend by linearity,'' and then show that T is an isomorphism.
We also have the nice fact that if dim(U) = dim(V) = n, U, V over F, and T : U Ž V is linear, then the following are equivalent: (ex)
1. T is a monomorphism.
2. T is an epimorphism.
3. T is an isomorphism.
This means, for example, that such a T is onto if and only ker(T) = 0.
If T : U Ž V and S : V Ž W, then (ex)
1. If both S and T are monomorphisms, then so is ST.
2. If ST is a monomorphism, then so is T.
3. If both S and T are epimorphisms, then so is ST.
4. If ST is an epimorphism, then so is S.
If T : V Ž V and S : V Ž V, and ST = TS, then (ex)
1. Both S and T are monomorphisms if and only if ST is a monomorphism.
2. Both S and T are epimorphisms if and only if ST is an epimorphism.
Let matrix(F, n, m) be the space of all n by m matrices with entries from F. We use ordinary entry by entry matrix addition, where (A + B)_ij = A_ij + B_ij, scalar multiplication, where (aA)_ij = aA_ij, and let (0)_ij = 0. Then matrix(F, n, m) is an n*m-dimensional vector space over F. If dim(U) = m and dim(V) = n, then we can define

mat : L(U, V) Ž matrix(F, n, m)

by mat(T) = [T].
This transformation is an isomorphism. (ex)

Exercises: Morphisms - mono, epi, and iso Top

Verify each of the statements marked (ex) in this section.
Show that if a function has an inverse, then the inverse is unique.
Consider the function T : Fⁿ[x] Ž Fⁿ⁺¹ given by

T(a₀ + a₁x + ź+ a_nxⁿ) = (a₀, ź, a_n).

Show that this function is an isomorphism.
Consider the function T : CŽ matrix(R, 2, 2) given by

T(a + bi)

=

é
ę
ë

a
-b

b
a
ů
ú
ű
1. Show that if we consider C as a real vector space, this function is a monomorphism.
2. Show that this function also respects complex multiplication, that is,
  
  T((a + bi)(c + di)) = T(a + bi) T(c + di).
More generally, consider the function T : matrix(C, n, n) Ž matrix(R, 2n, 2n) given by

T ć
ç
ç
ç
č é
ę
ę
ę
ë

a₁₁ + b₁₁i
ź
a_1n + b_1ni

:

:

a_n1 + b_n1i
ź
a_nn + b_nni
ů
ú
ú
ú
ű ö
÷
÷
÷
ř

= é
ę
ę
ę
ę
ę
ę
ę
ę
ë

a₁₁
-b₁₁
ź
a_1n
-b_1n

b₁₁
a₁₁
ź
b_1n
a_1n

:
:

:
:

a_n1
-b_n1
ź
a_nn
-b_nn

b_n1
a_n1
ź
b_nn
a_nn
ů
ú
ú
ú
ú
ú
ú
ú
ú
ű
1. Show that if we consider matrix(C, n, n) as a real vector space, this function is a monomorphism.
2. Show that this function also respects matrix multiplication, that is,
  
  T(A B) = T(A) T(B).

What the heck. Let H be the quaternions (described above). Consider the function T : matrix(H, n, n) Ž matrix(C, 2n, 2n) given by

ć
ç
ç
ç
č

é
ę
ę
ę
ë

a₁₁ + b₁₁j

a_1n + b_1nj

a_n1 + b_n1j

a_nn + b_nnj

ů
ú
ú
ú
ű

ö
÷
÷
÷
ř

é
ę
ę
ę
ę
ę
ę
ę
ę
ę
ę
ę
ę
ë

a₁₁

b₁₁

a_1n

b_1n

b₁₁

a₁₁

b_1n

a_1n

a_n1

b_n1

a_nn

b_nn

b_n1

a_n1

b_nn

a_nn

ů
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ú
ű

Show that if we consider matrix(H, n, n) as a complex vector space, this function is a monomorphism.
Show that this function also respects matrix multiplication, that is,

T(A B) = T(A) T(B).

Thus, if we denote by O(n), U(n), and Sp(n) the distance preserving linear operators on Rⁿ, Cⁿ, and Hⁿ respectively (called the orthogonal, unitary, and symplectic groups), then we have the monomorphisms:

źŽ O(n) Ž Sp(n) Ž U(2n) Ž O(4n)

Ž Sp(4n) Ž U(8n) Ž O(16n) Ž ź

Linear operators Top

If T : V Ž V is a linear transformation, we call T a linear operator on V. Note that if S and T are operators on V, then so is ST. We can abbreviate L(V, V) as L(V).
L(V) has the algebraic structure of a ring with identity. A ring is similar to a field (as defined above), except without the requirements that multiplication be commutative and that there be multiplicative inverses for non-zero elements. The identity element is I_V. L(V) is a non-commutative ring, since in general ST š TS. This is reflected in the fact that matrix multiplication is non-commutative. Only in very special cases is it true that [S][T] = [T][S] (for example, if both [S] and [T] are diagonal matrices, with [S]_ij = 0 for i š j, and [T]_ij = 0 for i š j).
Note that for operators, it makes sense to talk about T °T, and we can thus define Tⁿ = T °T °ź°T (n times). We also define T⁰ = I, the identity operator.
Thus, if we have a polynomial P(x) = a_nxⁿ + a_n-1x^n-1 + ź + a₁x + a₀ from F[x], we can talk about the polynomial in T:

P(T) = a_nTⁿ + a_n-1T^n-1 + ź+ a₁T + a₀.

(The a₀ term stands for a₀I.) This is an operator on V which acts on vectors as:

P(T)(v) = a_nTⁿ (v) + ź+ a₁T(v) + a₀v.
In fact, we can generalize this to power series. If

p(x) = Ľ
ĺ
n = 0
a_nxⁿ,

then we can define

p(T) = Ľ
ĺ
n = 0
a_nTⁿ.
This means, for example, that we can define the exponential of an operator:

exp(T) = Ľ
ĺ
n = 0
1
n!
Tⁿ.
We can even define the cosine or sine of an operator, etc.:

cos(T) = Ľ
ĺ
n = 0
(-1)ⁿ
(2n)!
T²ⁿ,

sin(T) = Ľ
ĺ
n = 0
(-1)ⁿ
(2n+1)!
T²ⁿ⁺¹.
We can talk about square roots of an operator, saying that S is a square root of T if S² = T. It is unlikely that S is unique.
We can also talk about logarithms of an operator, saying that S is a logarithm of T if exp(S) = T. Again, it is unlikely that S is unique.
Examples of linear operators:
1. A first important example begins like this: suppose we have a system of m linear equations in n variables
  
  a₁₁x₁ + a₁₂x₂ + ź+ a_1nx_n
  
  =
  
  b₁
  a₂₁x₁ + a₂₂x₂ + ź+ a_2nx_n
  
  =
  
  2₁
  :
  
  :
  a_m1x₁ + a_m2x₂ + ź+ a_mnx_n
  
  =
  
  b_m.
  
  If we let A : Fⁿ Ž F^m be the linear transformation with [A]_ij = a_ij, x be the vector (x₁, x₂, ź, x_n)^t, and b the vector (b₁, b₂, ź, b_m)^t, then we can rewrite the equation in the form:
  
  Ax = b.
  
  We then have several possibilities.
  1. The first case is when m = n. There are then two possibilities:
    1. If A is an isomorphism, then A^-1 exists, and we can solve the equation for x:
      
      x = A^-1Ax = A^-1b.
    2. If A is not an isomorphism, then in particular A is not a an epimorphism, and thus it is possible that b Ď im(A). In this case, there are no solutions to the equation.
      On the other hand, if b Î im(A) there is at least one solution x₀ with Ax₀ = b.
      Note, though, that we also know A is not a monomorphism, and hence dim(ker(A)) ł 1. Then, if z Î ker(A), we have A(x₀ + z) = Ax₀ + Az = Ax₀ + 0 = Ax₀ = b, and so x₀ + z is another solution. Furthermore, if y is another solution with Ay = b, then A(x₀ - y) = Ax₀ -Ay = b - b = 0. This means x₀ - y Î ker(A), and so y = x₀ + z for some z Î ker(A).
      Thus, if x₀ is a particular solution, then every solution is of the form x₀ + z for some z Î ker(A). The space of solutions is then a translation of the kernel of A, of the form x₀ + ker(A). We then only need to find one particular solution.
      In this case, we have broken the problem down into two parts: first, we solve Ax = 0 (called the homogeneous equation), then we find a single solution Ax₀ = b. For F = R or C, there will be infinitely many solutions.
  2. The second case is when m < n. In this case, A cannot be a monomorphism, and things are then similar to the second part of the previous case. If A is an epimorphism, there is sure to be at least one solution. Again, we solve the homogeneous case, and then find one particular solution. If A is not an epimorphism, there may not be any solutions. Otherwise, as above, in general we will have infinitely many solutions.
  3. If m > n, then A cannot be an epimorphism, and hence there may not be any solutions. If A is a monomorphism, if there is a solution, there will be just one. If A is not a monomorphism, if there is one solution, there will be infinitely many, as above.
  4. In all three cases, we can start by trying to find the inverse (or left partial inverse) of A. If A is a monomorphism, we can expect to be successful, and to be able to find the unique solution if it exists. If A is not a monomorphism, we solve the homogeneous equation, and then look for a single particular solution.
  5. Question: How can we tell if A is a monomorphism? How can we find A^-1, or at least S₁, the left partial inverse of A?
2. A second important example is the derivative operator
  
  D: C^Ľ(R) Ž C^Ľ(R).
  
  Note that this is a linear operator, since D(f + g) = D(f) + D(g), and D(af) = aD(f).
  Note that ker(D) = span({1}), the one-dimensional space consisting of all constant functions. If we collapse ker(D) down to nothing (in technical terms, form the quotient space ...) then we can think of D as an isomorphism (on the quotient space). D has an inverse,
  
  ó
  ő : C^Ľ(R) Ž C^Ľ(R).
  
  We then have (a variant of) the fundamental theorem of calculus:
  
  D ó
  ő (f) = ó
  ő D(f) = f
  
  (on the reduced space - that is, up to a constant).
3. We can then consider the general linear ordinary differential operators with constant coefficients. These are operators in R[D], that is operators of the form:
  
  P(D) = a_nDⁿ + ź+ a₁D+ a₀.
  
  These operators act on a function f Î C^Ľ(R) as
  
  P(D)(f) = a_nDⁿ(f) + ź+ a₁D(f) + a₀f.
  
  We then have the general nth order linear ordinary differential equation with constant coefficients:
  
  P(D)(f) = g.
  
  We can work on solving these equations with an approach similar to the method for systems of linear equations.
  We first note that if the function f₀ is a solution to this equation, and z = z(x) Î ker(P(D)), then f₀ + z is also a solution, and if f₁ is another solution, then P(D)(f₀ - f₁) = P(D)(f₀) - P(D)(f₁) = g - g = 0. Thus, all solutions are of the form f₀ + z where f₀ is some particular solution, and z Î ker(P(D)).
  We thus separate the problem into two parts. First we solve the associated homogeneous equation:
  
  P(D)(f) = 0
  
  to find ker(P(D)), and then we look for a single particular solution to the original equation.
  In general, we have that dim(ker(P(D))) = n, the degree of P. This is not an entirely obvious fact, but it is not counterintuitive ...
  Hence what we need to do is find n functions which form a basis for ker(P(D)). What we need, then, are n linearly independent functions each of which is a solution to the homogeneous equation.
  In theory (:-) this is not too hard.
  We note first that for the first-order case, we have the solution:
  
  (D- r)(e^rx)
  
  =
  
  D(e^rx) - re^rx
  
  =
  
  re^rx - re^rx
  
  =
  
  0,
  
  and, in the kth order extension of this,
  
  (D- r)^k(x^k-1e^rx) = 0.
  
  We also have that the set of functions A = {x^je^r_ix | 0 Ł j Ł k, 1 Ł i Ł m}
  is linearly independent. From this we see how to solve equations of the form:
  
  Ő
  i
  (D- r_i)^k_i(f) = 0.
  
  Now, consider the operator
  
  D² - 2sD+ s² + t²
  
  for s, t Î R.
  We have that
  
  (D² -2sD+ s² + t²)^k(x^k-1e^sxcos(tx)) = 0,
  
  (D² -2sD+ s² + t²)^k(x^k-1e^sxsin(tx)) = 0.
  
  We can think of this as a = s + ti, and that we are working with
  
  D² - 2sD+ s² + t² = (D- a)(D-
  a
  
  ).
  
  We have that the set of functions
  
  B = {x^je^s_kxcos(t_kx), x^me^s_nxsin(t_nx)}
  
  is also linearly independent. From this we see how to solve equations of the form:
  
  Ő
  i
  (D² -2s_iD+ s_i² + t_i²)^k_i(f) = 0.
  
  We can now put together all the pieces to solve the homogeneous equation P(D) = 0. We use the fact that any polynomial over R can be completely factored as
  
  P(D) =
  Ő
  i
  (D- r_i)^k_i
  Ő
  j
  (D-a_j)^k_j (D-
  a
  
  j
  )^k_j
  
  with r_i Î R and a_j Î C.
  To solve the inhomogeneous equation, we need only to find one particular solution of P(D)(f) = g.
  This is just the bare beginnings of techniques for solving differential equations, but it gives the flavor of some relatively powerful methods, and the role that linear algebra plays. I haven't even mentioned the issues of initial values/boundary conditions. For much more on these topics, look at a book such as Elementary Differential Equations with Linear Algebra, by Finney and Ostberg.
4. These approaches generalize to linear ordinary differential operators with non-constant coefficients:
  
  a_n(x)Dⁿ + a_n-1(x)D^n-1 + ź+ a₀(x),
  
  to systems of linear ordinary differential operators, and on to linear partial differential operators and systems of linear partial differential operators. But I think for now someone else will have to write that up ... :-)
5. We can develop a similar approach to solving linear discrete difference equations. For example, the difference equation (a_n+2 - a_n+1 - a_n) = (0) has as a solution the Fibonnacci sequence (a_n) = (1, 1, 2, 3, 5, 8, ź). We could work with the discrete version D of the differential operator D, where Df(x) = (f(x + 1) - f(x))/1 = f(x + 1) - f(x). In the discrete case, we can't let h go to zero.
  We would then have D((a_n)) = (a_n+1 - a_n). Our example difference equation would be
  
  (D² + D-1)((a_n)) = (0).
  
  Often, however, it is more convenient to work with the discrete increment operator E, with E(f(x)) = f(x + 1), or E((a_n)) = (a_n+1). Both D and E are linear operators. Our example equation is then E²((a_n)) = E((a_n)) + (a_n), or
  
  (E² - E - 1)((a_n)) = (0).
  
  In a general form, we have a polynomial P(x) for an equation of the form P(E)((a_n)) = (0). We can then find a general solution by using the facts that
  
  (E - a)^k((n^jaⁿ)) = (0)
  
  for 0 Ł j < k, and
  
  {(n^jaⁿ) | 0 Ł j < k}
  
  is a linearly independent set, etc.
6. Of course, all of these are the easy cases. The hard ones are the nonlinear equations ...
7. At times, this reminds me of a comment made by my Ph.D. thesis advisor, after I had finished the proof of my main result for all odd primes. He said there were three ways to think about it:
  1. I had handled infinitely many cases, and omitted only one, the even prime 2.
  2. I had done half the cases. I had handled all the odd primes, but none of the even ones.
  3. I had dealt with all the uninteresting cases, but not the single interesting case :-)

Exercises: Linear operators Top

Find a linear operator S such that

S²

=

é
ę
ë

-1
0

0
-1
ů
ú
ű
If S is the operator

S

=

é
ę
ę
ę
ę
ę
ę
ę
ę
ë

0
- p
4

p
4

0
ů
ú
ú
ú
ú
ú
ú
ú
ú
ű ,

what is exp(S)?
Solve the system of equations:

3x₁
+
x₂
-
2x₃
=
3

x₁
-
2x₂
-
x₃
=
1

2x₁
-
4x₂
-
x₃
=
3
Solve the differential equation:

d⁵f
dx⁵
- 7 d⁴f
dx⁴
+ 23 d³f
dx³
- 45 d²f
dx²
+ 48 df
dx
-20f

= 26cos(x) - 18sin(x)

Hint:

x⁵ - 7x⁴ + 23x³ - 45x² + 48x - 20

= (x - 1)(x - 2)²(x² - 2x + 5)
Find the general solution to the difference equation

(E² - E - 1)((a_n)) = (0)
Find the general solution to the difference equation

(E⁴ + 2E² + 1)((a_n)) = (0)
Verify that
1. D, the derivative, is a linear operator.
2. (D- r)^k(x^k-1e^rx) = 0
  
  (D² -2sD+ s² + t²)^k(x^k-1e^sxcos(tx))
  
  =
  
  0
  (D² -2sD+ s² + t²)^k(x^k-1e^sxsin(tx))
  
  =
  
  0
3. E, the discrete increment, is a linear operator.
4. (E - a)^k((n^jaⁿ)) = (0) for 0 Ł j < k.
What happens in nonlinear cases? Sometimes they are manageable, sometimes not.
1. Solve the differential equation
  
  Df = b * f * (1 - f).
2. What can be said about the difference equation
  
  E((a_n)) = (b * a_n * (1 - a_n))?
  
  Note: this is often called the logistics equation.

Normed linear spaces Top

A norm on a real or complex vector space V is a function

|| · || : V Ž R

with the properties, for v, v₁, v₂ Î V,
1. ||v|| ł 0
2. ||v|| = 0 if and only if u = 0
3. ||av|| = |a| ||v||
4. ||v₁ + v₂|| Ł ||v₁|| + ||v₂||.
A space with such an associated function is called a normed linear space.
Using the norm, we can define a topology on the space, and can then talk about continuity of functions or operators, limits of sequences, etc.
I won't go into this much here beyond a few examples, but good places to look are books on Hilbert Spaces and/or functional analysis. There are a few books indicated in the references.
Using the norm, we can define a metric on V by: d(v₁, v₂) = ||v₁ - v₂||. Metrics have the properties:
1. d(v₁, v₂) ł 0
2. d(v₁, v₂) = 0 iff v₁ = v₂
3. d(v₁, v₂) = d(v₂, v₁)
4. d(v₁, v₂) Ł d(v₁, v₃) + d(v₃, v₂)
  (the triangle inequality).
Some examples:
1. On Rⁿ or Cⁿ, we have the norm
  
  ||(a₁, a₂, ź, a_n)||₂ = (|a₁|² + ź+ |a_n|²)^1/2.
  
  This is usually called the Euclidean norm.
  We can also think of this in terms of the inner product given by
  < (a₁, a₂, ź, a_n), (b₁, b₂, ź, b_n) >
  = a₁[`b]₁ + a₂[`b]₂ + ... + a_n[`b]_n.
  We then have ||v||₂ = < v, v > ^1/2.
2. We can generalize, for p > 0, to
  
  ||(a₁, a₂, ź, a_n)||_p = (|a₁|^p + ź+ |a_n|^p)^1/p,
  
  and
  
  ||(a₁, a₂, ź, a_n)||_Ľ = max
  (|a₁|, ź, |a_n|).
  
  These are called the p-norms and the Ľ-norm (or max-norm).
  A fun little exercise is to draw the circle of radius 1 in R² for each of these norms:
  
  circle_p(1) = {(x, y) Î R² | ||(x, y)||_p = 1}
  
  for p > 0, or for p = Ľ.
  One of these constitutes a ``proof'' that a square is a circle :-)
3. We can generalize this to R^Ľ or C^Ľ:
  
  ||(a₁, a₂, ź)||_p
  
  =
  
  ć
  č Ľ
  ĺ
  n = 1
  |a_n|^p ö
  ř 1/p
  
  ,
  ||(a₁, a₂, ź)||_Ľ
  
  =
  
  sup
  n
  (|a_n|).
  
  For this to make sense, we will have to limit ourselves to the subspaces where the sum converges:
  
  l^p
  
  =
  
  {(a₁, a₂, ź) | ć
  č Ľ
  ĺ
  n = 1
  |a_n|^p ö
  ř 1/p
  
  < Ľ},
  l^Ľ
  
  =
  
  {(a₁, a₂, ź) |
  sup
  n
  (|a_n|) < Ľ}.
4. On C⁰(R) or C⁰(C), we can define
  
  ||f||_p = ć
  č ó
  ő |f|^p ö
  ř 1/p
  
  ||f||_Ľ =
  sup
  x
  (|f(x)|).
  
  Again, we limit ourselves to the subspaces where these are < Ľ, and call the corresponding spaces L^p or L^Ľ.
5. We can think each of these Euclidean norms, || · ||₂, as coming from an inner product:
  
  ||(a₁, a₂, ź)||₂
  
  =
  
  < (a₁, ź), (a₁, ź) > ^1/2
  
  =
  
  ć
  č Ľ
  ĺ
  n = 1
  a_n _
  a
  
  n
  ö
  ř 1/2
  
  ||f||₂
  
  =
  
  < f, f > ^1/2
  
  =
  
  ć
  č ó
  ő f _
  f
  
  ö
  ř 1/2
  
  .
  
  When a complex Euclidean normed linear space is complete (that is, every Cauchy sequence converges), it is called a Hilbert space.

Exercises: Normed linear spaces Top

Verify that each of the examples actually are norms.
In R², draw the unit circles
1. circle₁(1) = {(x, y) Î R² | ||(x, y)||₁ = 1}
2. circle_3/2(1) = {(x, y) Î R² |
  ||(x, y)||_3/2 = 1}
3. circle₂(1) = {(x, y) Î R² | ||(x, y)||₂ = 1}
4. circle₃(1) = {(x, y) Î R² | ||(x, y)||₃ = 1}
5. circle_Ľ(1) = {(x, y) Î R² |
  ||(x, y)||_Ľ = 1}

Suppose that V is a real or complex vector space. An inner product on V is a conjugate-bilinear function on V:

< ·, · > : V x V Ž F

where, for all v₁, v₂, v Î V, and a Î F,

< v , v >

0 iff v = 0

< v₁ + v₂ , v >

< v₁ , v > + < v₂ , v >

< v₁ , v₂ >

< v₂ , v₁ >

< av₁ , v₂ >

a < v₁ , v₂ > .

Show that for an inner product,

< v , v₁ + v₂ >

< v , v₁ > + < v , v₂ >

< v₁ , av₂ >

< v₁ , v₂ > .

Show that for a finite dimensional real or complex vector space V with basis (v₁, v₂, ź, v_n), the function f : V x V Ž F given by

ć
č

ĺ
i

a_i v_i,

ĺ
i

b_i v_i

ö
ř

ĺ
i

a_i

b_i

is an inner product.

Recall that a linear functional on a vector space V is a linear map f : V Ž F. For a finite dimensional real or complex inner product space V, define the dual space of V to be the space

V^* = { f : V Ž F | f is a linear functional}

Show that V^* is a vector space over F, and that V and V^* are isomorphic to each other.
Hint: Show that every f Î V^* corresponds with a function of the form

< v_f , · > : V Ž F

for some v_f Î V.

Eigenvectors and eigenvalues Top

Suppose T is a linear operator on V. Then l Î F is called an eigenvalue or characteristic value of T if there exists v š 0 in V with T(v) = lv. In this case we call v an eigenvector or characteristic vector of T.
An equivalent definition is that l is an eigenvalue of T if and only if (T - lI)v = 0 for some v š 0. This follows from the fact that 0 = (T - lI)v = T(v) - lv if and only if T(v) = lv. Note that this also means that ker(T - lI) š {0}. Thus, l is an eigenvalue if and only if (T - lI) is not a monomorphism. In the finite dimensional case, this is equivalent to (T - lI) not being invertible.
Now suppose that l₁, ..., l_k are distinct eigenvalues of T (i.e., l_i š l_j for i š j), with associated eigenvectors v₁, ź, v_k. Then {v₁, ź, v_k} is a linearly independent set.
This means in particular that if T Î L(V) with dim(V) = n, and T has n distinct eigenvalues l_i with associated eigenvectors v_i, then the set S = (v₁, ź, v_n) is a basis for V. Furthermore, the matrix representation of T with respect to the basis S is

[T] = é
ę
ę
ę
ę
ë

l₁
0
ź
0

0
l₂
ź
0

:
:
:
:

0
0
ź
l_n
ů
ú
ú
ú
ú
ű

or [T]_ii = l_i, [T]_ij = 0 for i š j. This is also sometimes written as [T] = diag(l₁, ź, l_n).
Eigenvalues and eigenvectors can thus give us a very simple representation for T.
Not all linear operators have eigenvalues (or eigenvectors). For example, the linear operator T_q : R² Ž R² given by

[T_q] = é
ę
ë

cos(q)
-sin(q)

sin(q)
cos(q)
ů
ú
ű

is a (counter-clockwise) rotation around the origin through the angle q. If q is not an integral multiple of p, then T_q has no eigenvalues.
More generally, consider a non-zero linear operator T : R² Ž R². If 0 š u Î R², then there must be a linear combination of {u, T(u), T²(u)} with

0 = T²(u) + aT(u) + b(u) = P_u(T)(u).

If P_u(T) factors into linear factors P_u(T) = (T - l₁I)(T - l₂I) with l₁, l₂ Î R, then l₁ and l₂ are eigenvalues of T. On the other hand, if a² - 4b < 0, then T has no eigenvalues (it is a general rotation in R²).
On the other hand, every non-zero linear operator T : Cⁿ Ž Cⁿ has an eigenvalue. We can see this from the fact that, for any u š 0 in Cⁿ, the set of vectors {u, T(u), T²(u), ź, Tⁿ(u)} must be a linearly dependent set, and hence there are a₀, a₁, ź, a_n not all zero with

0 = n
ĺ
i = 0
a_iTⁱ(u) = P_u(T)(u).

We know that we can factor the polynomial P_u(T) over C into a product of linear factors

P_u(T) =
Ő
j
(T - z_jI)^k_j(T).

This linear operator is not a monomorphism (since u Î ker(P_u(T))). It is a product of commutative factors, and hence at least one of the factors (T - z_jI) is not a monomorphism. This says then that at least one of the z_j is an eigenvalue of T. We can proceed to find n eigenvalues and eigenvectors.
For for every non-zero linear operator T : Cⁿ Ž Cⁿ, there is a polynomial P_T(T) = ĺ_ia_iTⁱ of degree n, called the characteristic polynomial of T, with:
1. P_T(T) = 0. That is, P_T(T)(u) = 0 for all u Î Cⁿ.
2. The linear factors of P_T(T) are each of the form (T - l_jI) for some eigenvalue l_j. Also, a_n = 1. That is,
  
  P_T(T) =
  Ő
  j
  (T - l_jI)^k_j(T).
3. For each eigenvalue l_j, there is an eigenvector u_j. If k_j > 1, there is a linearly independent set of k_j eigenvectors for l_j. The set of n eigenvectors {u_i} is a linearly independent set, and thus is a basis for Cⁿ. As above, the matrix representation of T with respect to this basis is [T] = diag(l₁, ź, l_n) (with repetitions as necessary).
A non-zero linear operator T : Rⁿ Ž Rⁿ also has a characteristic polynomial of degree n. The big difference for the real case is that the polynomial will factor into a combination of linear and quadratic factors. There is then a basis consisting partially of eigenvectors and partially of pairs of basis vectors for two-dimensional subspaces on which T is a rotation. The matrix of T with respect to this basis then looks like [T] = diag(l₁, ź, l_k, A_k+1,ź,A_(n-k)/2) where each A_i is a two-by-two matrix without eigenvalues.
The set of eigenvalues of an operator is sometimes called the spectrum of the operator. In cases like Cⁿ where the eigenvectors form a basis for the space, using such as basis is sometimes known as the spectral decomposition of the operator.
Just briefly, let's glance at differential operators. D: C^Ľ(R) Ž C^Ľ(R) has uncountably many eigenvalues. Each real number a Î R is an eigenvalue since

D(e^ax) = ae^ax.

The corresponding eigenvectors are of course f_a(x) = e^ax.
The differential operator D² also has uncountably many eigenvalues and eigenvectors since for a > 0,

D²(cos(Öax))

=

-acos(Öax),
D²(sin(Öax))

=

-asin(Öax), and
D²(e^Öax)

=

ae^Öax.
In this context, here's an example which puts many of these pieces together. In quantum mechanics, a typical expression of Schrödinger's equation looks like

é
ę
ë - (^h/_2p)²
2m_e
Del²+V(x,y,z) - i(^h/_2p) ś
śt
ů
ú
ű Y = 0.

This example is for an electron (with mass m_e) in a potential field V(x,y,z).
The general solution of this operator equation is

Y(x,y,z,t) = Ľ
ĺ
n = 0
c_nY_n(x,y,z)exp ć
ç
č -iE_nt
(^h/_2p)
ö
÷
ř

where Y_n(x,y,z) is an eigenfunction solution of the associated time independent Schrödinger equation, with E_n the corresponding eigenvalue. The inner product, giving a time dependent probability, looks like

P(t) = ó
ő Y
Y

dv.

Exercises: Eigenvectors and eigenvalues Top

Show that if v is an eigenvector of T corresponding with the eigenvalue l, and a Î F (a š 0), then av is also an eigenvector of T corresponding with l. In particular, eigenvectors are not unique.
Define T Î L(F²) by T((x, y)) = (y, x). Find all the eigenvalues and eigenvectors of T.
Define T Î L(F³) by T((x, y, z)) = (-y, 0, 2z). Find all the eigenvalues and eigenvectors of T.
Define T Î L(F^Ľ) by T((a₁, a₂, ź)) = (a₂, a₃, ź) (i.e., T is the left shift operator). Find all the eigenvalues and eigenvectors of T.
Suppose T Î L(V) is invertible. Show that l š 0 is an eigenvalue of T if and only if l^-1 is an eigenvalue of T^-1.
Suppose S, T Î L(V). Show that ST and TS have the same eigenvalues.
Give an example of an operator whose matrix has all zeros on the diagonal with respect to some basis, but which is invertible. Give an example of an operator whose matrix has all diagonal elements non-zero with respect to some basis, but which is singular (i.e., has no inverse).
Show that if S, T Î L(V), S is invertible, and P(x) Î F[x], then P(S^-1TS) = S^-1P(T)S.
Suppose that V is a finite dimensional normed complex vector space, and T is an isometry of V (i.e., ||T(v)|| = ||v|| for all v Î V). Show that every eigenvalue l of T has |l| = 1. Hint: show that there is a basis for V consisting of eigenvectors e_i, with ||e_i|| = 1 for 1 Ł i Ł n.
Let V be a complex vector space, and T Î L(V). Let P(x) Î C[x]. Show that a Î C is an eigenvalue of P(T) if and only if a = P(l) for some eigenvalue l of T.
Is the preceding true for real vector spaces? Why not? (Note: this is another example of why C is so much nicer a place to work than R ...)
Is the preceding true if we replace C[x] with C^Ľ[x], power series? If so, show it. If not, give a counterexample.

Change of basis Top

Recall that the matrix associated with a linear transformation depends on the particular bases we use. In the case of a linear operator T : V Ž V, it is typical to use the same base for V as domain and as codomain. However, as we have seen, a basis consisting of eigenvectors is particularly convenient, and hence it is useful to know how to change bases.
If V is finite dimensional, with two bases S₁ = (u₁, ź, u_n) and S₂ = (v₁, ź, v_n), we can consider the matrices of T with respect to the mixed bases. We can indicate this by the various symbols [T]^S₁, [T]^S₂, [T]^S₁S₂, and [T]^S₂S₁, where, for example

T(u_i)

=

ĺ
j
[T]_ji^S₁u_j, and
T(u_i)

=

ĺ
j
[T]_ji^S₁S₂v_j.
We can look in particular at the matrix representations for the identity operator I. For ease of notation, let [B] = [I]^S₁S₂, the matrix representation for I using S₁ for the domain and S₂ for the codomain. In particular, we then have

u_i =
ĺ
j
[B]_jiv_j = B_i^jv_j.

We also have that the matrix inverse [B]^-1 = [I]^S₂S₁ which takes us in the opposite direction. Putting the pieces together, we then have

[B][T]^S₁ = [T]^S₂[B].

That is, doing the operator T in the basis S₁ and then converting to the basis S₂ is the same as converting bases first, and then doing T in the second base.
Another way to say this is:

[T]^S₁ = [B]^-1[T]^S₂[B]

or

[T]^S₂ = [B][T]^S₁[B]^-1.

Exercises: Change of basis Top

Show that if A and B are operators on a finite dimensional vector space with AB = I, the identity operator, then BA = I, and so B = A^-1.
Suppose that T is an operator that has the same matrix with respect to every basis. Show that T must be some multiple of the identity operator I.

Trace and determinant Top

Let V be a finite dimensional vector space over R or C, and T : V Ž V an operator on V. Let P_T(T) = ĺ_{i = 0}ⁿ a_iTⁱ be the characteristic polynomial of T. We can then define the trace of T by trace(T) = a_n-1, the coefficient of T^n-1.
We can also define the trace of an n x n real or complex matrix A by trace_m(A) = ĺ_{i = 1}ⁿA_ii, the sum of the diagonal elements of A.
These definitions are consistent, in the sense that trace(T) = trace_m([T]), where [T] is the matrix of T with respect to some basis for V. An exercise will be to show that it doesn't matter what basis we use (they all come out the same). Since these definitions are consistent, we will ordinarily write them both the same way, as trace().
For S,T : V Ž V, and a, b Î F, the trace has the nice properties:
1. trace(aS + bT) = a trace(T) + b trace(T). This says that the trace is a linear functional on L(V).
2. In the complex case, we have
  
  trace(T) =
  ĺ
  i
  l_i
  
  where the l_i are the eigenvalues of T.
3. trace(ST) = trace(TS).
From these we can derive nice facts like:
If we define the commutator of S and T by [S, T] = ST - TS, then we always have [S, T] š I, the identity operator.
We can define the determinant of an operator T by det(T) = (-1)ⁿ a₀, where a₀ is the constant term in the characteristic polynomial.
The determinant has the nice properties:
1. In the complex case, we have
  
  det
  (T) =
  Ő
  i
  l_i
  
  where the l_i are the eigenvalues of T.
2. det(T) š 0 iff ker(T) = {0} iff T is invertible.
3. det(ST) = det(S)det(T) = det(TS).
4. P_T(z) = det(zI - T). That is, the characteristic polynomial is the determinant of the generalized operator (zI -T), where we think of z as a complex variable.
5. We can define the determinant of a square matrix M = [a_ij] by
  
  det_m(M) =
  ĺ
  p Î perm(n)
  sign(p)
  Ő
  i
  a_p(i)i
  
  where perm(n) is the set of all permutations of the numbers (1, 2, ź, n), sign(p) is the sign of the permutation, and p(i) is the ith number in the permutation p.
  We then have:
  
  det_m([T]) = det
  (T).
  
  As for the trace, since the two definitions are consistent, we will typically denote both determinants by the same symbol det().
  Showing that these two definitions are consistent is a fair amount of work (just looking at the formula for det_m() should give you some idea). You can look it up if you are interested.
6. There is another definition of determinant. Suppose V is a real or complex n-dimensional vector space. Then there is an alternating multi-linear functional
  
  det_v: Vⁿ Ž F
  
  with the properties, for all v₁, v₂, ź, v_n, v_i₁, v_i₂ Î V and a, b Î F:
  1. det_v(v₁, ź, v_i, ź, v_j, ź, v_n) = - det_v(v₁, ź, v_j, ź, v_i, ź, v_n) (alternating)
  2. det_v(v₁, ź, av_i₁ + bv_i₂, ź, v_n) = adet_v(v₁, ź, v_i₁, ź, v_n) + bdet_v(v₁, ź, v_i₂, ź, v_n) (multi-linear)
  3. det_v(e₁, ź, e_n) = 1 for a particular basis {e₁, ź, e_n} for V. (uniqueness)
  We then have det_m(A) = det_v(A₁, ź, A_n), where A_k is the kth column of the matrix A, considered as a vector.
  We have that det(T) = det_m([T]) = det_v([T]₁, ź, [T]_n). The fact that there are three different versions of the same thing suggests that many people have worked on this topic, and that this topic occurs in a variety of contexts ...
7. Recall the discussion of norms on vector spaces. The Euclidean norms (|| v ||₂), which can be thought of as coming from an inner product ( < v₁, v₂ > ), have a nice relationship with the determinant:
  
  ||T(v)||₂ = || v ||₂ for all v Î U
  
  iff | det
  (T)| = 1.
8. An operator which preserves the Euclidean norm (||T(v)||₂ = ||v||₂) is called an isometry. When the scalar field is R, these are called orthogonal operators. In the case of C, they are called unitary operators. In the case of the quaternions, they are called symplectic operators.
9. The isometries on a normed linear space have the properties:
  1. If T is an isometry, then det(T) š 0, and hence T has an inverse, T^-1, which is also an isometry. Of course, I, the identity operator, is an isometry.
  2. If T₁ and T₂ are isometries, then T₁T₂ is also an isometry.
  3. This means that the set of isometries forms a group. In the three cases described above, these are called the orthogonal group, the unitary group, and the symplectic group.
10. Another way to characterize the isometries (in the finite dimensional case, but also with appropriate generalizations in infinite dimensional cases) is that an operator is an isometry if 1.) its inverse exists, and 2.) the matrix of its inverse is given by the conjugate transpose. That is, if
  
  [T^-1]_ij =
  [T]_ji
  
  .
11. The determinant also plays an important role in change of variables formulas (in Rⁿ, for example).
  Suppose W Ě Rⁿ and s: W Ž Rⁿ. We will think of s as a (local) coordinate system on W, or as a change of variables.
  The derivative of s at x is the unique operator T (if it exists) satisfying:
  
  lim
  yŽ 0
  ||s(x + y) - s(x) - Ty||
  ||y||
  = 0.
  
  If this operator T exists, s is called differentiable, and T is denoted by s˘(x). Note that s˘(x) Î L(Rⁿ). We can write s(x) = (s₁(x), ź, s_n(x)), and we can denote the partial derivative of s_j with respect to the kth coordinate by D_ks_j(x).
  If s is differentiable at x, then the matrix of s˘(x) is given by:
  
  [s˘(x)] = é
  ę
  ę
  ę
  ë
  
  D₁s₁(x)
  ź
  D_ns₁(x)
  
  :
  
  :
  
  D₁s_n(x)
  ź
  D_ns_n(x)
  ů
  ú
  ú
  ú
  ű
  
  If we assume that W is a reasonable set (e.g., open, or measurable) and f : WŽ R is also reasonable (e.g., continuous, or measurable), then we have the change of variables formula:
  
  ó
  ő
  
  s(W)
  f(y)dy = ó
  ő
  
  W
  f(s(x))| det
  (s˘(x))|dx.

Exercises: Trace and determinant Top

Given two real or complex n x n matrices A and B, show that trace_m(AB) = trace_m(BA).
Given an operator T on a real or complex finite dimensional vector space V, show that trace_m([T]) and det_m([T]) are independent of the basis used for the matrix representation [T]. Hint: use the change of basis formula [T]^S₁ = [B]^-1[T]^S₂[B], the previous problem, and det_m(AB) = det_m(BA).
Show that trace is linear, that is, trace(aS + bT) = atrace(S) + btrace(T).
Show that trace(T) = ĺ_{i = 1}ⁿ[T]_ii.
For two operators S and T, show that [S, T] š I. Recall that [S,T] = ST - TS.
Show by example that in general, trace(ST) š trace(S)trace(T). In particular, find an operator T on a real vector space V with trace(T²) < 0.
Suppose A is an n x n real matrix, S Î L(Rⁿ) has matrix representation A with respect to some basis, and T Î L(Cⁿ) has matrix representation A with respect to some basis. Show that trace(S) = trace(T) and det(S) = det(T).
Show that if T is an isometry on a finite dimensional normed vector space, then |det(T)| = 1.
Show that if T is an operator on a complex vector space of dimension n, and a Î C, then det(aT) = aⁿdet(T).

Top

References

[1]: Axler, Sheldon, Linear Algebra Done Right, second edition, Springer-verlag, New York, 1997.
[2]: Finney, Ross L., and Ostberg, Donald R., Elementary Differential Equations with Linear Algebra, Addison-Wesley, 1976.
[3]: Hoffman, Kenneth, and Kunze, Ray, Linear Algebra, second edition, Engineering/Science/Mathematics, 1971.
[4]: Naimark, M. A., Normed Algebras, Wolters-Noordhoff, Groningen, the Netherlands, 1974.
[5]: Noble, Ben, and Daniel, James W., Applied Linear Algebra, third edition, Engineering/Science/Mathematics, 1988.
[6]: Prugovecki, Eduard, Quantum Mechanics in Hilbert Space, second edition, Academic Press, New York, 1981.
[7]: Rudin, Walter, Functional Analysis, McGraw-Hill, New York, 1973.
[8]: Strang, Gilbert, Linear Algebra and its Applications, Academic Press, New York, 1976.

File translated from T_EX by T_TH, version 2.25.
On 30 May 2001, 23:16.

A brief survey of linear algebra

Tom Carter http://cogs.csustan.edu/~tom/linear-algebra Santa Fe Institute Complex Systems Summer School

June, 2001

References

Tom Carter

http://cogs.csustan.edu/~tom/linear-algebra
Santa Fe Institute
Complex Systems Summer School