
Linear Algebra and Python Programming
Introduction to Linear Spaces: From Abstraction to Calculation
Linear algebra provides the foundational language for multi-dimensional mathematics. The power of its definitions lies in their abstraction, but this power is unlocked through concrete computation. Let's explore these concepts with a focus on the calculations that bring them to life.
- Definition of Linear Spaces (Vector Spaces)
A Linear Space \( V \) over a field \( \mathbb{F} \) (like \( \mathbb{R} \)) is a set of objects (vectors) where two operations-vector addition and scalar multiplication-are defined and satisfy the eight axioms (closure, associativity, commutativity, identity elements, etc.). The key is that these operations must be closed : performing them on vectors in \( V \) must result in another vector in \( V \).
Calculation Examples:
Let's verify the vector space axioms for a non-obvious example.
Example 1: The Space of Real-Valued Functions on \( \mathbb{R} \).
Let \( V \) be the set of all functions \( f: \mathbb{R} \to \mathbb{R} \). Define:
- Addition: \( (f + g)(x) = f(x) + g(x) \)
- Scalar Multiplication: \( (c \cdot f)(x) = c \cdot f(x) \)
Let's check a few axioms with functions \( f(x) = x^2 \), \( g(x) = \sin(x) \), and \( h(x) = e^x \).
- Commutativity of Addition:
* LHS: \( (f + g)(x) = x^2 + \sin(x) \)
* RHS: \( (g + f)(x) = \sin(x) + x^2 \)
* Since real number addition is commutative, \( x^2 + \sin(x) = \sin(x) + x^2 \). Thus, \( f + g = g + f \). Verified.
- Distributivity of Scalars:
* Let \( a=2 \), \( b=3 \).
* LHS: \( ((a+b) \cdot f)(x) = (5 \cdot f)(x) = 5x^2 \)
* RHS: \( (a \cdot f + b \cdot f)(x) = (2x^2) + (3x^2) = 5x^2 \)
* LHS = RHS. Verified.
- Additive Identity:
* The zero vector is the zero function , \( \mathbf{0}(x) = 0 \).
* Check: \( (f + \mathbf{0})(x) = x^2 + 0 = x^2 = f(x) \). Verified.
Example 2: A Set that is Not a Vector Space.
Let \( W = \{ (x, y) \in \mathbb{R}^2 : x \geq 0, y \geq 0 \} \). This is the first quadrant including the axes.
- Closure under Addition? Yes. If \( (x_1, y_1) \) and \( (x_2, y_2) \) are in \( W \), then \( (x_1+x_2, y_1+y_2) \) has non-negative components.
- Closure under Scalar Multiplication? No. Let \( \mathbf{u} = (1, 1) \in W \) and scalar \( c = -1 \).
* \( c \cdot \mathbf{u} = (-1)\cdot(1, 1) = (-1, -1) \).
* \( (-1, -1) \) is not in \( W \) because its components are negative.
* Conclusion: \( W \) is not a vector space.
- Subspaces
A subspace is a subset \( W \) of a vector space \( V \) that is itself a vector space under the same operations. The Subspace Test simplifies verification to three critical checks.
Calculation Examples:
Example 1: A Line Through the Origin in \( \mathbb{R}^3 \).
Let \( W = \{ (x, y, z) \in \mathbb{R}^3 : x = 2t, y = -t, z = t, t \in \mathbb{R} \} \). This is the set of all scalar multiples of the vector \( (2, -1, 1) \).
- Contains Zero Vector? Let \( t=0 \). Then \( (0, 0, 0) \in W \). Yes.
- Closure under Addition? Let \( \mathbf{u} = (2a, -a, a) \) and \( \mathbf{v} = (2b, -b, b) \) be in \( W \).
* \( \mathbf{u} + \mathbf{v} = (2a+2b, -a-b, a+b) = (2(a+b), -(a+b), (a+b)) \).
* This is of the form \( (2t, -t, t) \) with \( t = a+b \). So \( \mathbf{u} + \mathbf{v} \in W \). Yes.
- Closure under Scalar Multiplication? Let \( \mathbf{u} = (2t, -t, t) \in W \) and \( c \in \mathbb{R} \).
* \( c\mathbf{u} = (2ct, -ct, ct) = (2t', -t', t') \) where \( t' = ct \).
* So \( c\mathbf{u} \in W \). Yes.
Since all three conditions hold, \( W \) is a subspace.
Example 2: The Set of Solutions to a Homogeneous System.
Determine if the following set is a subspace of \( \mathbb{R}^3 \):
\( W = \{ (x, y, z) : 2x - y + 3z = 0 \} \).
- Contains Zero Vector? \( 2(0) - 0 + 3(0) = 0 \). So \( (0,0,0) \in W \). Yes.
- Closure under Addition? Let \( \mathbf{u}=(x_1, y_1, z_1) \) and \( \mathbf{v}=(x_2, y_2, z_2) \) be in \( W \). This means:
\( 2x_1 - y_1 + 3z_1 = 0 \) and \( 2x_2 - y_2 + 3z_2 = 0 \).
* Now, \( \mathbf{u} + \mathbf{v} = (x_1+x_2, y_1+y_2, z_1+z_2) \).
* Check the condition: \( 2(x_1+x_2) - (y_1+y_2) + 3(z_1+z_2) \)
* \( = (2x_1 - y_1 + 3z_1) + (2x_2 - y_2 + 3z_2) = 0 + 0 = 0 \).
* So \( \mathbf{u} + \mathbf{v} \in W \). Yes.
- Closure under Scalar Multiplication? Let \( \mathbf{u}=(x, y, z) \in W \) and \( c \in \mathbb{R} \).
* \( c\mathbf{u} = (cx, cy, cz) \).
* Check the condition: \( 2(cx) - (cy) + 3(cz) = c(2x - y + 3z) = c(0) = 0 \).
* So \( c\mathbf{u} \in W \). Yes.
Therefore, \( W \) is a subspace (specifically, a plane through the origin).
- Linear Span
The span of a set \( S = \{\mathbf{v}_1, \mathbf{v}_2, ..., \mathbf{v}_k\} \) is the set of all linear combinations:
\( \text{span}(S) = \{ a_1\mathbf{v}_1 + a_2\mathbf{v}_2 + ... + a_k\mathbf{v}_k : a_i \in \mathbb{R} \} \).
Calculation Examples:
Example 1: Span of Two Vectors in \( \mathbb{R}^3 \).
Let \( S = \{ (1, 2, 3), (4, 5, 6) \} \). Find \( \text{span}(S) \).
* \( \text{span}(S) = \{ a(1, 2, 3) + b(4, 5, 6) : a, b \in \mathbb{R} \} \)
* \( = \{ (a + 4b,\ 2a + 5b,\ 3a + 6b) : a, b \in \mathbb{R} \} \).
This describes a plane in \( \mathbb{R}^3 \) that passes through the origin and contains the vectors \( (1,2,3) \) and \( (4,5,6) \).
Example 2: Determining if a Vector is in a Span.
Is the vector \( \mathbf{b} = (7, 8, 9) \) in \( \text{span}(S) \) from Example 1?
We need to check if there exist scalars \( a, b \) such that:
\( a(1, 2, 3) + b(4, 5, 6) = (7, 8, 9) \).
This vector equation leads to a system of linear equations:
- \( a + 4b = 7 \)
- \( 2a + 5b = 8 \)
- \( 3a + 6b = 9 \)
Let's solve the system. From equation (1), \( a = 7 - 4b \). Substitute into equation (2):
\( 2(7 - 4b) + 5b = 8 \)
\( 14 - 8b + 5b = 8 \)
\( -3b = -6 \)
\( b = 2 \)
Then \( a = 7 - 4(2) = -1 \).
Now check with equation (3): \( 3(-1) + 6(2) = -3 + 12 = 9 \). This works.
Since a solution exists \( (a=-1, b=2) \), the vector \( \mathbf{b} \) is in the span. In fact, \( \mathbf{b} = -1\cdot(1,2,3) + 2\cdot(4,5,6) \).
- Linear Dependence and Independence
A set \( S = \{\mathbf{v}_1, \mathbf{v}_2, ..., \mathbf{v}_k\} \) is linearly independent if the only solution to \( a_1\mathbf{v}_1 + a_2\mathbf{v}_2 + ... + a_k\mathbf{v}_k = \mathbf{0} \) is \( a_1 = a_2 = ... = a_k = 0 \). Otherwise, it is linearly dependent .
Calculation Examples:
Example 1: Testing for Independence in \( \mathbb{R}^3 \).
Determine if \( S = \{ (1, 2, 3), (4, 5, 6), (2, 1, 0) \} \) is linearly independent.
Set up the equation: \( a(1, 2, 3) + b(4, 5, 6) + c(2, 1, 0) = (0, 0, 0) \).
This gives the system:
- \( a + 4b + 2c = 0 \)
- \( 2a + 5b + c = 0 \)
- \( 3a + 6b + 0c = 0 \)
From equation (3): \( 3a + 6b = 0 \) โ \( a = -2b \).
Substitute into equation (2): \( 2(-2b) + 5b + c = 0 \) โ \( -4b + 5b + c = 0 \) โ \( b + c = 0 \) โ \( c = -b \).
Substitute \( a = -2b \) and \( c = -b \) into equation (1): \( (-2b) + 4b + 2(-b) = -2b + 4b - 2b = 0 \). This is true for any \( b \).
The system has infinitely many solutions. For example, let \( b=1 \), then \( a=-2 \), \( c=-1 \). This is a non-trivial solution: \( -2(1,2,3) + 1(4,5,6) -1(2,1,0) = (0,0,0) \). Therefore, the set \( S \) is linearly dependent .
Example 2: A Linearly Independent Set of Polynomials.
Show that \( S = \{1, x, x^2\} \) in \( P_2 \) is linearly independent.
Set up the equation: \( a\cdot1 + b\cdot x + c\cdot x^2 = 0 \) (the zero polynomial).
For this polynomial to be zero for all \( x \), all coefficients must be zero:
* Constant term: \( a = 0 \)
* \( x \) term: \( b = 0 \)
* \( x^2 \) term: \( c = 0 \)
The only solution is the trivial solution \( a=b=c=0 \). Therefore, the set is linearly independent .
- Basis and Dimension
A basis \( B \) for a vector space \( V \) is a linearly independent set that spans \( V \). The dimension of \( V \) is the number of vectors in any basis for \( V \).
Calculation Examples:
Example 1: Finding a Basis for a Span.
Find a basis for \( W = \text{span}\{ (1, 2, 3), (4, 5, 6), (2, 1, 0) \} \).
From the dependence calculation above, we found that \( (2,1,0) \) can be expressed in terms of the first two: \( (2,1,0) = 2(1,2,3) -1(4,5,6) \). This means the third vector is redundant. The set \( \{ (1, 2, 3), (4, 5, 6) \} \) still spans \( W \) and is linearly independent (check: one is not a scalar multiple of the other). Therefore, a basis for \( W \) is \( \{ (1, 2, 3), (4, 5, 6) \} \), and \( \text{dim}(W) = 2 \).
Example 2: Finding the Dimension and a Basis for a Solution Space.
Find the dimension and a basis for the subspace \( W \) of \( \mathbb{R}^4 \) defined by:
\( x + 2y - z + 3w = 0 \)
This is a homogeneous system with one equation. We have 4 variables and 1 leading variable (let's choose \( x \)), so there will be \( 4-1=3 \) free variables: \( y, z, w \).
Let's express the solution in parametric form:
\( x = -2y + z - 3w \)
So, any solution is of the form:
\( (x, y, z, w) = (-2y + z - 3w,\ y,\ z,\ w) \)
We can separate the free variables:
\( = y(-2, 1, 0, 0) + z(1, 0, 1, 0) + w(-3, 0, 0, 1) \)
The set \( \{ (-2, 1, 0, 0),\ (1, 0, 1, 0),\ (-3, 0, 0, 1) \} \) spans \( W \). It is also linearly independent because each vector has a 1 in a position where the others have 0 (look at the 2nd, 3rd, and 4th coordinates). Therefore, this set is a basis for \( W \), and \( \text{dim}(W) = 3 \).
๐ง Concept Review
Given vectors \( \mathbf{a}, \mathbf{b}, \mathbf{c} \) in \( \mathbb{R}^3 \), form a matrix with them as columns: \( A = \begin{bmatrix} | & | & | \\ \mathbf{a} & \mathbf{b} & \mathbf{c} \\ | & | & | \end{bmatrix} \)
Then solve
$$ A \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \mathbf{0} $$
If the only solution is \(x = y = z = 0 \), the vectors are linearly independent.
How to test independence
Option 1: Gaussian elimination
You row-reduce the matrix to row echelon form (REF) (upper triangular shape).
If the number of pivots = number of columns, the vectors are independent.
Option 2: Gauss-Jordan elimination
You continue further to reduced row echelon form (RREF) (diagonal 1's and 0's elsewhere).
Same logic - if you get 3 pivot columns, the vectors are independent.
๐ Both methods lead to the same conclusion, but Gauss-Jordan just simplifies it more.
Example
Let's take:
$$ \mathbf{a} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 2 \\ -1 \\ 0 \end{bmatrix}, \quad \mathbf{c} = \begin{bmatrix} 0 \\ 1 \\ 4 \end{bmatrix} $$
Form the matrix:
$$ A = \begin{bmatrix} 1 & 2 & 0 \\ 2 & -1 & 1 \\ 3 & 0 & 4 \end{bmatrix} $$
Step 1 โ Apply Gaussian elimination
Start:
$$ \begin{bmatrix} 1 & 2 & 0 \\ 2 & -1 & 1 \\ 3 & 0 & 4 \end{bmatrix} $$
Make zeros below the first pivot (row 1, col 1):
$$(R_2 \to R_2 - 2R_1)$$
$$(R_3 \to R_3 - 3R_1)$$
$$ \Rightarrow \begin{bmatrix} 1 & 2 & 0 \\ 0 & -5 & 1 \\ 0 & -6 & 4 \end{bmatrix} $$
Now pivot on row 2, col 2 = -5.
Make zero below it:
$$(R_3 \to R_3 - \frac{-6}{-5}R_2 = R_3 - 1.2R_2)$$
$$ \Rightarrow \begin{bmatrix} 1 & 2 & 0 \\ 0 & -5 & 1 \\ 0 & 0 & 2.8 \end{bmatrix} $$
Step 2 โ Interpret
You have 3 pivots (one in each column). That means:
$$\text{rank}(A) = 3$$
Since rank = number of vectors = 3 โ linearly independent
Linear Transformations and Their Matrix Representations
Introduction
At the heart of linear algebra lies the concept of a linear transformation. It is a special kind of function that moves vectors from one vector space to another while preserving the core algebraic structures of vector addition and scalar multiplication. Understanding linear transformations is crucial because they are the language through which we describe rotations, projections, shears, and countless other operations in mathematics, physics, computer graphics, and engineering.
This article will guide you through:
- The formal definition of a linear transformation and key examples.
- The profound connection between linear transformations and matrices.
- How the matrix representation of a transformation *depends* on the choice of basis for the vector spaces.
- Linear Transformations: Definition and Examples
1.1 Formal Definition
Let \( V \) and \( W \) be vector spaces over the same field \( \mathbb{F} \) (e.g., the real numbers \( \mathbb{R} \) or complex numbers \( \mathbb{C} \)). A function \( T: V \to W \) is called a linear transformation if it satisfies the following two properties for all vectors \( \mathbf{u}, \mathbf{v} \in V \) and all scalars \( c \in \mathbb{F} \):
- *Additivity (Preservation of Addition):
\( T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}) \)
- *Homogeneity (Preservation of Scalar Multiplication):
\( T(c\mathbf{u}) = cT(\mathbf{u}) \)
These two properties can be combined into a single condition:
\[ T(c_1\mathbf{v}_1 + c_2\mathbf{v}_2) = c_1T(\mathbf{v}_1) + c_2T(\mathbf{v}_2) \]
for all vectors \( \mathbf{v}_1, \mathbf{v}_2 \in V \) and all scalars \( c_1, c_2 \in \mathbb{F} \).
Key Consequences:
* \( T(\mathbf{0}_V) = \mathbf{0}_W \). The zero vector in \( V \) maps to the zero vector in \( W \).
* \( T(-\mathbf{v}) = -T(\mathbf{v}) \).
1.2 Key Examples
Let's explore some fundamental examples of linear transformations.
Example 1.1: The Zero Transformation
The function \( T: V \to W \) defined by \( T(\mathbf{v}) = \mathbf{0}_W \) for all \( \mathbf{v} \in V \) is linear. It maps every vector to the zero vector.
Example 1.2: The Identity Transformation
The function \( T: V \to V \) defined by \( T(\mathbf{v}) = \mathbf{v} \) for all \( \mathbf{v} \in V \) is linear. It leaves every vector unchanged.
Example 1.3: Transformations from \( \mathbb{R}^2 \) to \( \mathbb{R}^2 \) (Geometric Transformations)
These are easy to visualize.
* Rotation (\( R_\theta \)): Rotates vectors counterclockwise by an angle \( \theta \) about the origin.
* *Linearity:* Rotating the sum of two vectors is the same as adding the two rotated vectors.
* Reflection: Reflects vectors across a line through the origin (e.g., the x-axis, y-axis, or line \(y=x\)).
* Projection (\( P_x \)): Projects vectors onto a line through the origin. For example, projection onto the x-axis: \( P_x(x, y) = (x, 0) \).
* *Linearity:* The projection of a sum is the sum of the projections.
* Scaling (Dilation/Contraction): Scales vectors by a constant factor. For example, \( T(x, y) = (2x, 2y) \) scales everything by 2.
Example 1.4: Matrix-Vector Multiplication as a Linear Transformation
Let \( A \) be an \( m \times n \) matrix. Then the function \( T: \mathbb{R}^n \to \mathbb{R}^m \) defined by \( T(\mathbf{x}) = A\mathbf{x} \) is a linear transformation.
* *Proof:* \( T(\mathbf{u} + \mathbf{v}) = A(\mathbf{u} + \mathbf{v}) = A\mathbf{u} + A\mathbf{v} = T(\mathbf{u}) + T(\mathbf{v}) \). Similarly, \( T(c\mathbf{u}) = A(c\mathbf{u}) = c(A\mathbf{u}) = cT(\mathbf{u}) \).
This example is critically important because, as we will see, every linear transformation between finite-dimensional vector spaces can be expressed as matrix multiplication.
Example 1.5: The Derivative as a Linear Transformation
Let \( P_n \) be the vector space of all polynomials of degree at most \( n \). The derivative operator \( D: P_n \to P_{n-1} \) defined by \( D(p(x)) = p'(x) \) is linear.
* *Proof:* \( D(p(x) + q(x)) = p'(x) + q'(x) = D(p(x)) + D(q(x)) \), and \( D(c \cdot p(x)) = c \cdot p'(x) = c \cdot D(p(x)) \).
- Matrix Representation of Linear Transformations
We now bridge the gap between the abstract concept of a linear transformation and the concrete world of matrices. The key idea is that once we choose a basis for the domain and codomain, a linear transformation can be completely described by a matrix.
2.1 The Standard Matrix of a Linear Transformation
Let \( T: \mathbb{R}^n \to \mathbb{R}^m \) be a linear transformation. Let \( \mathcal{E} = \{ \mathbf{e}_1, \mathbf{e}_2, \dots, \mathbf{e}_n \} \) be the standard basis for \( \mathbb{R}^n \). The standard matrix \( A \) of \( T \) is the \( m \times n \) matrix whose \( j\)-th column is the vector \( T(\mathbf{e}_j) \). That is,
\[ A = \begin{bmatrix} | & | & & | \\ T(\mathbf{e}_1) & T(\mathbf{e}_2) & \cdots & T(\mathbf{e}_n) \\ | & | & & | \end{bmatrix} \]
Then, for any vector \( \mathbf{x} \in \mathbb{R}^n \), we have \( T(\mathbf{x}) = A\mathbf{x} \).
Why does this work?
Any vector \( \mathbf{x} = (x_1, x_2, \dots, x_n) \) can be written as \( \mathbf{x} = x_1\mathbf{e}_1 + x_2\mathbf{e}_2 + \dots + x_n\mathbf{e}_n \). By linearity:
\[ T(\mathbf{x}) = T(x_1\mathbf{e}_1 + \dots + x_n\mathbf{e}_n) = x_1T(\mathbf{e}_1) + \dots + x_nT(\mathbf{e}_n) \]
The right-hand side is exactly a linear combination of the columns of \( A \), which is \( A\mathbf{x} \).
Example 2.1: Finding the Standard Matrix
Let \( T: \mathbb{R}^2 \to \mathbb{R}^2 \) be the transformation that rotates vectors by \( 90^\circ \) counterclockwise.
* The standard basis vectors are \( \mathbf{e}_1 = (1, 0) \) and \( \mathbf{e}_2 = (0, 1) \).
* Compute their images:
* \( T(\mathbf{e}_1) = T(1, 0) = (0, 1) \)
* \( T(\mathbf{e}_2) = T(0, 1) = (-1, 0) \)
* The standard matrix \( A \) is formed by using these images as columns:
\[ A = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \]
* Let's verify with a vector, say \( \mathbf{x} = (3, 2) \). Geometrically, rotating (3,2) by \(90^\circ\) should give (-2, 3).
\[ T(\mathbf{x}) = A\mathbf{x} = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} 3 \\ 2 \end{bmatrix} = \begin{bmatrix} (0)(3) + (-1)(2) \\ (1)(3) + (0)(2) \end{bmatrix} = \begin{bmatrix} -2 \\ 3 \end{bmatrix} \quad \checkmark \]
2.2 The Matrix Representation with Respect to Arbitrary Bases
The concept of the standard matrix generalizes to any linear transformation between finite-dimensional vector spaces and any choice of bases.
Theorem: Let \( V \) and \( W \) be finite-dimensional vector spaces with bases \( \mathcal{B} = \{\mathbf{b}_1, \mathbf{b}_2, \dots, \mathbf{b}_n\} \) and \( \mathcal{C} = \{\mathbf{c}_1, \mathbf{c}_2, \dots, \mathbf{c}_m\} \) respectively. Let \( T: V \to W \) be a linear transformation. Then there exists a unique \( m \times n \) matrix \( [T]_{\mathcal{C}\leftarrow \mathcal{B}} \) such that for every vector \( \mathbf{v} \in V \),
\[ [T(\mathbf{v})]_{\mathcal{C}} = [T]_{\mathcal{C}\leftarrow \mathcal{B}} [\mathbf{v}]_{\mathcal{B}} \]
Here, \( [\mathbf{v}]_{\mathcal{B}} \) denotes the coordinate vector of \( \mathbf{v} \) with respect to basis \( \mathcal{B} \).
How to find \( [T]_{\mathcal{C}\leftarrow \mathcal{B}} \):
The \( j\)-th column of \( [T]_{\mathcal{C}\leftarrow \mathcal{B}} \) is the coordinate vector of \( T(\mathbf{b}_j) \) with respect to the basis \( \mathcal{C} \). That is,
- Apply \( T \) to the \( j\)-th basis vector of \( \mathcal{B} \): \( T(\mathbf{b}_j) \).
- Find the coordinates of this result with respect to the basis \( \mathcal{C} \): \( [T(\mathbf{b}_j)]_{\mathcal{C}} \).
- This coordinate vector becomes the \( j\)-th column of the matrix.
\[ [T]_{\mathcal{C}\leftarrow \mathcal{B}} = \begin{bmatrix} | & | & & | \\ [T(\mathbf{b}_1)]_{\mathcal{C}} & [T(\mathbf{b}_2)]_{\mathcal{C}} & \cdots & [T(\mathbf{b}_n)]_{\mathcal{C}} \\ | & | & & | \end{bmatrix} \]
Example 2.2: Matrix Representation with Non-Standard Bases
Let \( T: \mathbb{R}^2 \to \mathbb{R}^2 \) be defined by \( T(x, y) = (x + 2y, 3x + 4y) \). Let \( \mathcal{B} = \mathcal{C} = \left\{ \mathbf{b}_1 = (1,1), \mathbf{b}_2 = (1,-1) \right\} \). Find \( [T]_{\mathcal{C}\leftarrow \mathcal{B}} \).
* Step 1: Apply \( T \) to each basis vector in \( \mathcal{B} \).
* \( T(\mathbf{b}_1) = T(1, 1) = (1+2, 3+4) = (3, 7) \)
* \( T(\mathbf{b}_2) = T(1, -1) = (1-2, 3-4) = (-1, -1) \)
* Step 2: Find the coordinate vectors of these results with respect to \( \mathcal{C} \). We need to solve:
* \( [T(\mathbf{b}_1)]_{\mathcal{C}} = [(3,7)]_{\mathcal{C}} = \begin{bmatrix} a \\ b \end{bmatrix} \) such that \( a(1,1) + b(1,-1) = (3,7) \).
This gives the system: \( a + b = 3 \), \( a - b = 7 \). Solving: \( a = 5, b = -2 \). So, \( [T(\mathbf{b}_1)]_{\mathcal{C}} = \begin{bmatrix} 5 \\ -2 \end{bmatrix} \).
* \( [T(\mathbf{b}_2)]_{\mathcal{C}} = [(-1,-1)]_{\mathcal{C}} = \begin{bmatrix} c \\ d \end{bmatrix} \) such that \( c(1,1) + d(1,-1) = (-1,-1) \).
This gives the system: \( c + d = -1 \), \( c - d = -1 \). Solving: \( c = -1, d = 0 \). So, \( [T(\mathbf{b}_2)]_{\mathcal{C}} = \begin{bmatrix} -1 \\ 0 \end{bmatrix} \).
* Step 3: Assemble the matrix.
\[ [T]_{\mathcal{C}\leftarrow \mathcal{B}} = \begin{bmatrix} 5 & -1 \\ -2 & 0 \end{bmatrix} \]
* Verification: Let's test with a vector, say \( \mathbf{v} = (4, 2) \). First, find its coordinate vector relative to \( \mathcal{B} \). We need \( [\mathbf{v}]_{\mathcal{B}} = \begin{bmatrix} p \\ q \end{bmatrix} \) such that \( p(1,1) + q(1,-1) = (4,2) \). This gives \( p+q=4, p-q=2 \), so \( p=3, q=1 \). Thus, \( [\mathbf{v}]_{\mathcal{B}} = \begin{bmatrix} 3 \\ 1 \end{bmatrix} \).
Now, compute \( [T(\mathbf{v})]_{\mathcal{C}} = [T]_{\mathcal{C}\leftarrow \mathcal{B}} [\mathbf{v}]_{\mathcal{B}} = \begin{bmatrix} 5 & -1 \\ -2 & 0 \end{bmatrix} \begin{bmatrix} 3 \\ 1 \end{bmatrix} = \begin{bmatrix} 14 \\ -6 \end{bmatrix} \).
What does this mean? It means \( T(\mathbf{v}) = 14\mathbf{b}_1 + (-6)\mathbf{b}_2 = 14(1,1) - 6(1,-1) = (14-6, 14+6) = (8, 20) \).
Let's check directly: \( T(4, 2) = (4 + 2\times 2, 3\times 4 + 4\times 2) = (8, 20) \). It matches! โ
- Dependence of the Matrix Representation on Bases
The matrix \( [T]_{\mathcal{C}\leftarrow \mathcal{B}} \) is a *representation* of the abstract transformation \( T \). This representation is not unique; it depends entirely on the choices of the bases \( \mathcal{B} \) and \( \mathcal{C} \). The same linear transformation can look very different when viewed through different "lenses" (bases).
3.1 Illustrative Example: The Identity Transformation
Let \( V = \mathbb{R}^2 \), and let \( T: V \to V \) be the identity transformation: \( T(\mathbf{v}) = \mathbf{v} \).
* Case 1: Let \( \mathcal{B} = \mathcal{C} = \mathcal{E} \) (the standard basis). Then:
* \( T(\mathbf{e}_1) = \mathbf{e}_1 = (1, 0) \), so \( [T(\mathbf{e}_1)]_{\mathcal{E}} = (1, 0)^T \).
* \( T(\mathbf{e}_2) = \mathbf{e}_2 = (0, 1) \), so \( [T(\mathbf{e}_2)]_{\mathcal{E}} = (0, 1)^T \).
* The matrix representation is \( [T]_{\mathcal{E}\leftarrow \mathcal{E}} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = I_2 \).
* Case 2: Let \( \mathcal{B} = \mathcal{E} \), but \( \mathcal{C} = \left\{ (1,1), (1,-1) \right\} \). Then:
* \( T(\mathbf{e}_1) = \mathbf{e}_1 = (1, 0) \). We need \( [T(\mathbf{e}_1)]_{\mathcal{C}} = [(1,0)]_{\mathcal{C}} \). Solve \( a(1,1) + b(1,-1) = (1,0) \). Solution: \( a = 0.5, b = 0.5 \).
* \( T(\mathbf{e}_2) = \mathbf{e}_2 = (0, 1) \). We need \( [T(\mathbf{e}_2)]_{\mathcal{C}} = [(0,1)]_{\mathcal{C}} \). Solve \( c(1,1) + d(1,-1) = (0,1) \). Solution: \( c = 0.5, d = -0.5 \).
* The matrix representation is \( [T]_{\mathcal{C}\leftarrow \mathcal{E}} = \begin{bmatrix} 0.5 & 0.5 \\ 0.5 & -0.5 \end{bmatrix} \).
This is a stark difference! The identity transformation, which is represented by the simple identity matrix with respect to the standard basis, is represented by a completely different, non-identity matrix with respect to a different codomain basis.
3.2 Change of Basis and Similarity
A particularly important case is when \( T: V \to V \) is a linear operator (a transformation from a space to itself). In this case, we typically use the same basis for the domain and codomain. Let \( \mathcal{B} \) and \( \mathcal{B}' \) be two different bases for \( V \). We have two different matrix representations for the same transformation \( T \):
* \( A = [T]_{\mathcal{B}} = [T]_{\mathcal{B}\leftarrow \mathcal{B}} \)
* \( A' = [T]_{\mathcal{B}'} = [T]_{\mathcal{B}'\leftarrow \mathcal{B}'} \)
How are \( A \) and \( A' \) related? They are similar matrices .
Theorem: Let \( P = [I]_{\mathcal{B}\leftarrow \mathcal{B}'} \) be the change-of-basis matrix from \( \mathcal{B}' \) to \( \mathcal{B} \) (its columns are the \( \mathcal{B}' \)-basis vectors written in terms of \( \mathcal{B} \)). Then the matrix representations are related by:
\[ A' = P^{-1} A P \quad \text{or equivalently} \quad A = P A' P^{-1} \]
This relationship is fundamental. It tells us that while the matrix for a transformation can change when we change the basis, all these different matrices are *similar*. They share important properties that are intrinsic to the transformation itself, such as:
* The determinant (\( \det(A') = \det(A) \))
* The trace (\( \operatorname{tr}(A') = \operatorname{tr}(A) \))
* The eigenvalues
* The rank
These are called invariants of the linear transformation.
Example 3.1: Similarity Transformation
Let \( T: \mathbb{R}^2 \to \mathbb{R}^2 \) be defined by \( T(x, y) = (x+2y, 3x+4y) \).
* With the standard basis \( \mathcal{E} \), we have \( A = [T]_{\mathcal{E}} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \).
* Now, use the basis \( \mathcal{B} = \left\{ (1,1), (1,-1) \right\} \). Let's find \( A' = [T]_{\mathcal{B}} \).
* Step 1: Find the change-of-basis matrix \( P = [I]_{\mathcal{E}\leftarrow \mathcal{B}} \). The columns of \( P \) are the \( \mathcal{B} \)-basis vectors written in the standard basis.
\[ P = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \]
* Step 2: Find \( P^{-1} \). For a \( 2\times2 \) matrix, \( P^{-1} = \frac{1}{-2} \begin{bmatrix} -1 & -1 \\ -1 & 1 \end{bmatrix} = \begin{bmatrix} 0.5 & 0.5 \\ 0.5 & -0.5 \end{bmatrix} \).
* Step 3: Compute \( A' = P^{-1} A P \).
\[ A' = \begin{bmatrix} 0.5 & 0.5 \\ 0.5 & -0.5 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \]
First, compute \( AP \): \( \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} = \begin{bmatrix} 3 & -1 \\ 7 & -1 \end{bmatrix} \).
Then, compute \( P^{-1}(AP) \): \( \begin{bmatrix} 0.5 & 0.5 \\ 0.5 & -0.5 \end{bmatrix} \begin{bmatrix} 3 & -1 \\ 7 & -1 \end{bmatrix} = \begin{bmatrix} 5 & -1 \\ -2 & 0 \end{bmatrix} \).
So, \( A' = [T]_{\mathcal{B}} = \begin{bmatrix} 5 & -1 \\ -2 & 0 \end{bmatrix} \).
Notice that this matches the matrix we found in Example 2.2, where we had \( \mathcal{B} = \mathcal{C} \). This confirms the relationship \( A' = P^{-1} A P \).
Conclusion
Linear transformations are the central objects of study in linear algebra. They provide a unified way to describe a vast array of operations.
- *Definition: They are functions that preserve vector space structure (addition and scalar multiplication).
- *Matrix Representation: For finite-dimensional spaces, every linear transformation can be represented by a matrix. The columns of this matrix are the images of the domain basis vectors, expressed in coordinates relative to the codomain basis.
- *Dependence on Bases: The matrix representation is not unique. It is a *coordinate-dependent* description of the transformation. Changing the basis changes the matrix. For a linear operator, different representations are related by similarity transformations \( (A' = P^{-1} A P) \), and key properties (determinant, trace, eigenvalues) remain unchanged.
This framework allows us to move fluidly between the abstract, geometric view of linear transformations and the concrete, computational world of matrices, making it one of the most powerful tools in mathematics.
Matrices and Their Fundamental Properties
- Matrix Definition and Types
A matrix is a rectangular array of numbers, symbols, or expressions, arranged in rows and columns. The individual items in a matrix are called its elements or entries . A matrix with \( m \) rows and \( n \) columns is said to be an \( m \times n \) matrix.
Notation: A matrix is typically denoted by a capital letter (e.g., \( A \)). Its element in the \( i \)-th row and \( j \)-th column is denoted by a lowercase letter with subscripts, such as \( a_{ij} \).
\[A = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}\]
Common Types of Matrices
- *Square Matrix: A matrix with the same number of rows and columns (\( m = n \)).
* Example: \( \begin{bmatrix} 2 & 7 \\ -1 & 4 \end{bmatrix} \) is a \( 2 \times 2 \) matrix.
- *Identity Matrix (\( I_n \)): A square matrix with 1's on the main diagonal and 0's elsewhere. It acts as the multiplicative identity in matrix algebra (\( AI = IA = A \)).
* Example: \( I_3 = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \).
- *Zero Matrix (\( 0 \)): A matrix where every element is zero.
* Example: \( \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} \).
- *Diagonal Matrix: A square matrix where all entries outside the main diagonal are zero.
* Example: \( \begin{bmatrix} 3 & 0 & 0 \\ 0 & -5 & 0 \\ 0 & 0 & 1 \end{bmatrix} \).
- *Triangular Matrix:
* Upper Triangular: All entries below the main diagonal are zero.
* Example: \( \begin{bmatrix} 2 & 3 & 1 \\ 0 & 4 & 5 \\ 0 & 0 & 6 \end{bmatrix} \).
* Lower Triangular: All entries above the main diagonal are zero.
* Example: \( \begin{bmatrix} 2 & 0 & 0 \\ 1 & 4 & 0 \\ 3 & 2 & 6 \end{bmatrix} \).
- *Symmetric Matrix: A square matrix that is equal to its transpose (\( A = A^T \)).
* Example: \( A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 5 \\ 3 & 5 & 6 \end{bmatrix} = A^T \).
- Rank of a Matrix
The rank of a matrix is one of the most fundamental concepts in linear algebra. It provides a measure of the "non-degeneracy" of the system of linear equations the matrix represents.
Definition
The rank of a matrix \( A \) is defined as the maximum number of linearly independent column vectors in the matrix (this is the column rank ). Equivalently, it is the maximum number of linearly independent row vectors (the row rank ). A key theorem states that for any matrix, the column rank equals the row rank.
In simpler terms: The rank is the dimension of the vector space generated (or spanned) by its columns or rows. It tells you the number of "essential" pieces of information in the matrix.
How to Find the Rank
The most systematic method is using Gaussian Elimination to reduce the matrix to its Row Echelon Form (REF) . The number of non-zero rows in the REF is the rank of the matrix.
* Pivot: The first non-zero number in a non-zero row of a matrix in REF.
* Rank = Number of Pivots.
Example:
Find the rank of \( A = \begin{bmatrix} 1 & 2 & 1 \\ 2 & 4 & 2 \\ 1 & 3 & 2 \end{bmatrix} \).
- Perform row operations to get REF:
* \( R_2 \to R_2 - 2R_1 \): \( \begin{bmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ 1 & 3 & 2 \end{bmatrix} \)
* \( R_3 \to R_3 - R_1 \): \( \begin{bmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ 0 & 1 & 1 \end{bmatrix} \)
* Swap \( R_2 \) and \( R_3 \): \( \begin{bmatrix} 1 & 2 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 0 \end{bmatrix} \) โ This is the REF.
- Count the non-zero rows: There are 2 non-zero rows.
- *Conclusion: \( \text{rank}(A) = 2 \).
Interpretation: Even though \( A \) has 3 columns, the information contained in them can be described by only 2 linearly independent vectors. The second column is simply twice the first column, making it redundant.
- Column Space and Row Space
Column Space
The column space , denoted \( \text{Col}(A) \), of an \( m \times n \) matrix \( A \) is the set of all possible linear combinations of its column vectors. It is a subspace of \( \mathbb{R}^m \).
\[\text{Col}(A) = \{ \mathbf{y} \in \mathbb{R}^m \mid \mathbf{y} = A\mathbf{x} \text{ for some } \mathbf{x} \in \mathbb{R}^n \}\]
In simpler terms: It's all the vectors \( \mathbf{b} \) for which the equation \( A\mathbf{x} = \mathbf{b} \) has a solution. The dimension of the column space is equal to the rank of \( A \).
Row Space
The row space , denoted \( \text{Row}(A) \), of an \( m \times n \) matrix \( A \) is the set of all possible linear combinations of its row vectors. It is a subspace of \( \mathbb{R}^n \). The dimension of the row space is also equal to the rank of \( A \).
Example:
Let \( A = \begin{bmatrix} 1 & 0 & 1 \\ 2 & 1 & 3 \\ 3 & 1 & 4 \end{bmatrix} \).
* Column Vectors: \( \mathbf{a}_1 = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}, \mathbf{a}_2 = \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}, \mathbf{a}_3 = \begin{bmatrix} 1 \\ 3 \\ 4 \end{bmatrix} \).
* Column Space (\( \text{Col}(A) \)): This is the span of \( \{\mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3\} \). Notice that \( \mathbf{a}_3 = \mathbf{a}_1 + \mathbf{a}_2 \). So, \( \mathbf{a}_3 \) is linearly dependent on \( \mathbf{a}_1 \) and \( \mathbf{a}_2 \). A basis for \( \text{Col}(A) \) is \( \{\mathbf{a}_1, \mathbf{a}_2\} \). Thus, \( \text{dim}(\text{Col}(A)) = 2 \).
* Row Vectors: \( \mathbf{r}_1 = \begin{bmatrix} 1 & 0 & 1 \end{bmatrix}, \mathbf{r}_2 = \begin{bmatrix} 2 & 1 & 3 \end{bmatrix}, \mathbf{r}_3 = \begin{bmatrix} 3 & 1 & 4 \end{bmatrix} \).
* Row Space (\( \text{Row}(A) \)): This is the span of the rows. Notice \( \mathbf{r}_3 = \mathbf{r}_1 + \mathbf{r}_2 \). A basis for \( \text{Row}(A) \) is \( \{\mathbf{r}_1, \mathbf{r}_2\} \). Thus, \( \text{dim}(\text{Row}(A)) = 2 \).
Key Observation: \( \text{rank}(A) = \text{dim}(\text{Col}(A)) = \text{dim}(\text{Row}(A)) = 2 \).
- Rank Factorization
Rank factorization is a powerful way to decompose a matrix based on its rank. It reveals the underlying structure by separating the independent columns and rows.
Theorem (Rank Factorization)
If an \( m \times n \) matrix \( A \) has rank \( r \), then it can be factorized as:
\[A = CR\]
where:
* \( C \) is an \( m \times r \) matrix whose columns are a basis for the column space of \( A \).
* \( R \) is an \( r \times n \) matrix whose rows are a basis for the row space of \( A \).
Why is this useful? It compresses the matrix. Instead of storing all \( m \times n \) entries, we only need to store the \( m \times r \) matrix \( C \) and the \( r \times n \) matrix \( R \), which is more efficient if the rank \( r \) is small.
How to Perform Rank Factorization
- Find the rank \( r \) of \( A \) (e.g., by finding the REF).
- Identify \( r \) linearly independent columns of \( A \). These will form the matrix \( C \).
- Express every column of \( A \) as a linear combination of the columns of \( C \). The coefficients of these combinations form the rows of the matrix \( R \).
Example:
Let's find the rank factorization of \( A = \begin{bmatrix} 1 & 0 & 1 \\ 2 & 1 & 3 \\ 3 & 1 & 4 \end{bmatrix} \).
- *Find the Rank: From the previous example, we know \( \text{rank}(A) = r = 2 \).
- *Identify Independent Columns: We saw that \( \mathbf{a}_1 \) and \( \mathbf{a}_2 \) are linearly independent. So, we form matrix \( C \) from them:
\[C = \begin{bmatrix} 1 & 0 \\ 2 & 1 \\ 3 & 1 \end{bmatrix}\]
- *Find Matrix \( R \): We need to express every column of \( A \) as a linear combination of the columns of \( C \).
* Column 1 of \( A \) (\( \mathbf{a}_1 \)): \( \mathbf{a}_1 = 1\cdot \mathbf{c}_1 + 0\cdot \mathbf{c}_2 \). Coefficients: \( (1, 0) \).
* Column 2 of \( A \) (\( \mathbf{a}_2 \)): \( \mathbf{a}_2 = 0\cdot \mathbf{c}_1 + 1\cdot \mathbf{c}_2 \). Coefficients: \( (0, 1) \).
* Column 3 of \( A \) (\( \mathbf{a}_3 \)): \( \mathbf{a}_3 = \mathbf{a}_1 + \mathbf{a}_2 = 1\cdot \mathbf{c}_1 + 1\cdot \mathbf{c}_2 \). Coefficients: \( (1, 1) \).
These coefficient vectors become the rows of \( R \). The first row of \( R \) corresponds to the coefficients for the first column of \( A \), and so on.
\[R = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix}\]
*(Note: The order of rows in \( R \) must match the order of the basis vectors in \( C \). Here, the first row of \( R \) corresponds to the combinations for the first basis vector \( \mathbf{c}_1 \), and the second row for \( \mathbf{c}_2 \).)*
- *Verification:
\[A = CR = \begin{bmatrix} 1 & 0 \\ 2 & 1 \\ 3 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 1 \\ 2 & 1 & 3 \\ 3 & 1 & 4 \end{bmatrix}\]
The factorization is correct.
This decomposition shows that the entire structure of the \( 3 \times 3 \) matrix \( A \) is captured by its two independent columns and the two independent rows of the coefficient matrix \( R \).
Composition and Inverses of Linear Transformations
- Composition of Linear Transformations
Definition
If \( T: U \to V \) and \( S: V \to W \) are linear transformations, then their composition \( S \circ T: U \to W \) is defined by:
\[(S \circ T)(\mathbf{u}) = S(T(\mathbf{u})) \quad \text{for all } \mathbf{u} \in U\]
Key Properties
- Linearity: The composition of linear transformations is always linear
- Associativity: \( (R \circ S) \circ T = R \circ (S \circ T) \)
- Matrix Representation: If \( T \) has matrix \( A \) and \( S \) has matrix \( B \), then \( S \circ T \) has matrix \( BA \)
Example 1: Geometric Transformations
Let \( T: \mathbb{R}^2 \to \mathbb{R}^2 \) be rotation by \( 90^\circ \) counterclockwise, and \( S: \mathbb{R}^2 \to \mathbb{R}^2 \) be reflection across the x-axis.
Matrices:
- \( T \) has matrix: \( A = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \)
- \( S \) has matrix: \( B = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} \)
Composition \( S \circ T \):
\[BA = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} 0 & -1 \\ -1 & 0 \end{bmatrix}\]
Let's apply to vector \( \mathbf{v} = \begin{bmatrix} 1 \\ 2 \end{bmatrix} \):
- \( T(\mathbf{v}) = \begin{bmatrix} -2 \\ 1 \end{bmatrix} \) (rotation)
- \( S(T(\mathbf{v})) = \begin{bmatrix} -2 \\ -1 \end{bmatrix} \) (reflection)
- Direct: \( (S \circ T)(\mathbf{v}) = \begin{bmatrix} 0 & -1 \\ -1 & 0 \end{bmatrix} \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} -2 \\ -1 \end{bmatrix} \)
Example 2: Different Dimensions
Let \( T: \mathbb{R}^2 \to \mathbb{R}^3 \) with matrix \( A = \begin{bmatrix} 1 & 0 \\ 2 & 1 \\ 0 & 1 \end{bmatrix} \)
Let \( S: \mathbb{R}^3 \to \mathbb{R}^2 \) with matrix \( B = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix} \)
Composition \( S \circ T: \mathbb{R}^2 \to \mathbb{R}^2 \):
\[BA = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 2 & 1 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 2 & 1 \end{bmatrix} \]
- Inverse of a Linear Transformation
Definition
A linear transformation \( T: V \to W \) is invertible if there exists a linear transformation \( T^{-1}: W \to V \) such that:
\[T^{-1} \circ T = \text{id}_V \quad \text{and} \quad T \circ T^{-1} = \text{id}_W\]
where \( \text{id}_V \) and \( \text{id}_W \) are identity transformations on \( V \) and \( W \) respectively.
Conditions for Invertibility
- \( T \) must be one-to-one (injective) and onto (surjective)
- For finite dimensions: \( \dim(V) = \dim(W) \)
- The matrix representation of \( T \) must be square and invertible
Example 1: Finding an Inverse
Let \( T: \mathbb{R}^2 \to \mathbb{R}^2 \) with matrix \( A = \begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix} \)
Step 1: Check invertibility
\[\det(A) = (2)(1) - (1)(1) = 1 \neq 0 \quad \Rightarrow \quad \text{Invertible} \]
Step 2: Find inverse matrix
\[A^{-1} = \frac{1}{1} \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} = \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix}\]
Step 3: Verify
Let \( \mathbf{v} = \begin{bmatrix} 3 \\ 4 \end{bmatrix} \):
- \( T(\mathbf{v}) = \begin{bmatrix} 2 & 1 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} 3 \\ 4 \end{bmatrix} = \begin{bmatrix} 10 \\ 7 \end{bmatrix} \)
- \( T^{-1}(T(\mathbf{v})) = \begin{bmatrix} 1 & -1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} 10 \\ 7 \end{bmatrix} = \begin{bmatrix} 3 \\ 4 \end{bmatrix} \)
Example 2: Non-invertible Transformation
Let \( T: \mathbb{R}^2 \to \mathbb{R}^2 \) with matrix \( A = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} \)
\[\det(A) = (1)(4) - (2)(2) = 0 \quad \Rightarrow \quad \text{Not invertible}\]
Why? The columns are linearly dependent: \( \begin{bmatrix} 2 \\ 4 \end{bmatrix} = 2 \begin{bmatrix} 1 \\ 2 \end{bmatrix} \)
- Range and Nullspace
Range (Image)
The range of a linear transformation \( T: V \to W \) is:
\[\text{Range}(T) = \{ T(\mathbf{v}) \in W \mid \mathbf{v} \in V \}\]
- It's a subspace of \( W \)
- Also called the image of \( T \)
- \( \dim(\text{Range}(T)) = \text{rank}(T) \)
Nullspace (Kernel)
The nullspace of a linear transformation \( T: V \to W \) is:
\[\text{Null}(T) = \{ \mathbf{v} \in V \mid T(\mathbf{v}) = \mathbf{0} \}\]
- It's a subspace of \( V \)
- Also called the kernel of \( T \)
- \( \dim(\text{Null}(T)) = \text{nullity}(T) \)
Example: Finding Range and Nullspace
Let \( T: \mathbb{R}^3 \to \mathbb{R}^2 \) with matrix \( A = \begin{bmatrix} 1 & 2 & 0 \\ 2 & 1 & 1 \end{bmatrix} \)
Finding the Nullspace:
Solve \( A\mathbf{x} = \mathbf{0} \):
\[\begin{bmatrix} 1 & 2 & 0 \\ 2 & 1 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}\]
Row reduction:
\[\begin{bmatrix} 1 & 2 & 0 \\ 0 & -3 & 1 \end{bmatrix} \Rightarrow \begin{cases} x_1 + 2x_2 = 0 \\ -3x_2 + x_3 = 0 \end{cases}\]
Let \( x_2 = t \), then:
- \( x_1 = -2t \)
- \( x_3 = 3t \)
\[\text{Null}(T) = \left\{ t \begin{bmatrix} -2 \\ 1 \\ 3 \end{bmatrix} \mid t \in \mathbb{R} \right\}\]
\[\text{Basis for Null}(T) = \left\{ \begin{bmatrix} -2 \\ 1 \\ 3 \end{bmatrix} \right\}, \quad \text{nullity}(T) = 1\]
Finding the Range:
The range is spanned by the columns of \( A \):
\[\text{Col}(A) = \text{span}\left\{ \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \begin{bmatrix} 2 \\ 1 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \end{bmatrix} \right\}\]
Check linear independence:
- \( \begin{bmatrix} 0 \\ 1 \end{bmatrix} \) is not a multiple of \( \begin{bmatrix} 1 \\ 2 \end{bmatrix} \)
- The first two columns are linearly independent
- So basis: \( \left\{ \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \end{bmatrix} \right\} \)
\[\text{Range}(T) = \mathbb{R}^2, \quad \text{rank}(T) = 2\]
- Dimension Theorem (Rank-Nullity Theorem)
The Theorem
For any linear transformation \( T: V \to W \) where \( V \) is finite-dimensional:
\[\dim(V) = \text{rank}(T) + \text{nullity}(T)\]
or equivalently:
\[\dim(V) = \dim(\text{Range}(T)) + \dim(\text{Null}(T))\]
Interpretation
- Rank: Number of "independent directions" in the output
- Nullity: Number of "independent directions" that get collapsed to zero
- Sum: Total "information capacity" of the input space
Example 1: Verification
From our previous example with \( T: \mathbb{R}^3 \to \mathbb{R}^2 \):
- \( \dim(V) = \dim(\mathbb{R}^3) = 3 \)
- \( \text{rank}(T) = 2 \)
- \( \text{nullity}(T) = 1 \)
- Check: \( 3 = 2 + 1 \) โ
Example 2: Injective and Surjective Cases
Case 1: Injective (one-to-one) transformation
Let \( T: \mathbb{R}^2 \to \mathbb{R}^3 \) with matrix \( A = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix} \)
- \( \text{Null}(T) = \{\mathbf{0}\} \) โ nullity = 0
- \( \text{Range}(T) \) is a 2D plane in \( \mathbb{R}^3 \) โ rank = 2
- \( 2 = 2 + 0 \) โ
Case 2: Surjective (onto) transformation
Let \( T: \mathbb{R}^3 \to \mathbb{R}^2 \) with matrix \( A = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} \)
- \( \text{Null}(T) = \text{span}\left\{ \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix} \right\} \) โ nullity = 1
- \( \text{Range}(T) = \mathbb{R}^2 \) โ rank = 2
- \( 3 = 2 + 1 \) โ
Case 3: Isomorphism (bijective)
Let \( T: \mathbb{R}^2 \to \mathbb{R}^2 \) with matrix \( A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \)
- \( \det(A) = -2 \neq 0 \) โ nullity = 0, rank = 2
- \( 2 = 2 + 0 \) โ
Applications of Dimension Theorem
- Quick dimension checks: If \( T: \mathbb{R}^n \to \mathbb{R}^m \) and \( \text{rank}(T) = r \), then \( \text{nullity}(T) = n - r \)
- Impossibility results:
- No linear transformation \( T: \mathbb{R}^n \to \mathbb{R}^m \) can be injective if \( n > m \)
- No linear transformation \( T: \mathbb{R}^n \to \mathbb{R}^m \) can be surjective if \( n < m \)
- System solving: For \( A\mathbf{x} = \mathbf{b} \), the solution set (if it exists) is an affine space of dimension equal to nullity(\( A \))
The dimension theorem provides a fundamental constraint that governs all linear transformations between finite-dimensional vector spaces, revealing deep connections between the input space, the transformation's "compression" behavior (nullspace), and its "coverage" behavior (range).
Euclidean Spaces and Inner Product
- Definition of Euclidean Spaces
A Euclidean space is a mathematical space that generalizes the 2D plane and 3D physical space to any number of dimensions, where distances and angles are measured using the Euclidean metric (the familiar Pythagorean formula).
Euclidean n-Space
The Euclidean n-space , denoted \( \mathbb{R}^n \), is the set of all ordered n-tuples of real numbers:
\[\mathbb{R}^n = \{ (x_1, x_2, \ldots, x_n) \mid x_i \in \mathbb{R} \text{ for } i = 1, 2, \ldots, n \}\]
Each element \( \mathbf{x} = (x_1, x_2, \ldots, x_n) \) is called a vector . The Euclidean space comes equipped with:
- Vector addition: \( \mathbf{x} + \mathbf{y} = (x_1 + y_1, x_2 + y_2, \ldots, x_n + y_n) \)
- Scalar multiplication: \( c\mathbf{x} = (cx_1, cx_2, \ldots, cx_n) \)
- Zero vector: \( \mathbf{0} = (0, 0, \ldots, 0) \)
Examples:
- \( \mathbb{R}^2 \): The familiar Cartesian plane
- \( \mathbb{R}^3 \): Three-dimensional space
- \( \mathbb{R}^4 \): Four-dimensional space (used in relativity, computer graphics)
- Inner Product and Its Properties
Dot Product (Standard Inner Product)
For vectors \( \mathbf{x} = (x_1, x_2, \ldots, x_n) \) and \( \mathbf{y} = (y_1, y_2, \ldots, y_n) \) in \( \mathbb{R}^n \), the dot product is defined as:
\[\mathbf{x} \cdot \mathbf{y} = x_1y_1 + x_2y_2 + \cdots + x_ny_n = \sum_{i=1}^n x_iy_i\]
Example 1:
Let \( \mathbf{u} = (1, 2, 3) \), \( \mathbf{v} = (4, 5, 6) \) in \( \mathbb{R}^3 \)
\[\mathbf{u} \cdot \mathbf{v} = (1)(4) + (2)(5) + (3)(6) = 4 + 10 + 18 = 32\]
Example 2:
Let \( \mathbf{a} = (2, -1, 3, 1) \), \( \mathbf{b} = (1, 0, -2, 4) \) in \( \mathbb{R}^4 \)
\[\mathbf{a} \cdot \mathbf{b} = (2)(1) + (-1)(0) + (3)(-2) + (1)(4) = 2 + 0 - 6 + 4 = 0\]
General Inner Product
An inner product on a real vector space \( V \) is a function \( \langle \cdot, \cdot \rangle: V \times V \to \mathbb{R} \) satisfying:
- Symmetry: \( \langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle \)
- Linearity in first argument:
- \( \langle \mathbf{u} + \mathbf{v}, \mathbf{w} \rangle = \langle \mathbf{u}, \mathbf{w} \rangle + \langle \mathbf{v}, \mathbf{w} \rangle \)
- \( \langle c\mathbf{u}, \mathbf{v} \rangle = c\langle \mathbf{u}, \mathbf{v} \rangle \)
- Positive definiteness:
- \( \langle \mathbf{u}, \mathbf{u} \rangle \geq 0 \)
- \( \langle \mathbf{u}, \mathbf{u} \rangle = 0 \) if and only if \( \mathbf{u} = \mathbf{0} \)
Norm (Length) from Inner Product
The norm (or length) of a vector is defined as:
\[\|\mathbf{u}\| = \sqrt{\langle \mathbf{u}, \mathbf{u} \rangle}\]
For the standard dot product:
\[\|\mathbf{u}\| = \sqrt{u_1^2 + u_2^2 + \cdots + u_n^2}\]
Example:
For \( \mathbf{u} = (3, 4) \) in \( \mathbb{R}^2 \):
\[\|\mathbf{u}\| = \sqrt{3^2 + 4^2} = \sqrt{9 + 16} = \sqrt{25} = 5\]
Distance Between Vectors
The distance between vectors \( \mathbf{u} \) and \( \mathbf{v} \) is:
\[d(\mathbf{u}, \mathbf{v}) = \|\mathbf{u} - \mathbf{v}\|\]
Example:
For \( \mathbf{u} = (1, 2) \), \( \mathbf{v} = (4, 6) \) in \( \mathbb{R}^2 \):
\[\mathbf{u} - \mathbf{v} = (-3, -4), \quad \|\mathbf{u} - \mathbf{v}\| = \sqrt{(-3)^2 + (-4)^2} = 5\]
- Cauchy-Schwarz Inequality
Theorem (Cauchy-Schwarz Inequality)
For any vectors \( \mathbf{u}, \mathbf{v} \) in an inner product space:
\[|\langle \mathbf{u}, \mathbf{v} \rangle| \leq \|\mathbf{u}\| \cdot \|\mathbf{v}\|\]
Equality holds if and only if \( \mathbf{u} \) and \( \mathbf{v} \) are linearly dependent.
Proof Sketch:
For all real \( t \), consider:
\[\langle \mathbf{u} + t\mathbf{v}, \mathbf{u} + t\mathbf{v} \rangle \geq 0\]
This gives a quadratic in \( t \) that must have non-positive discriminant, leading to the inequality.
Example 1: Verification
Let \( \mathbf{u} = (1, 2, 3) \), \( \mathbf{v} = (4, 5, 6) \)
- \( \mathbf{u} \cdot \mathbf{v} = 32 \)
- \( \|\mathbf{u}\| = \sqrt{1 + 4 + 9} = \sqrt{14} \approx 3.74 \)
- \( \|\mathbf{v}\| = \sqrt{16 + 25 + 36} = \sqrt{77} \approx 8.77 \)
- Check: \( 32 \leq (3.74)(8.77) \approx 32.8 \) โ
Example 2: Application to Angles
In \( \mathbb{R}^2 \), the angle \( \theta \) between vectors satisfies:
\[\cos \theta = \frac{\mathbf{u} \cdot \mathbf{v}}{\|\mathbf{u}\|\|\mathbf{v}\|}\]
By Cauchy-Schwarz: \( -1 \leq \cos \theta \leq 1 \)
Let \( \mathbf{u} = (1, 0) \), \( \mathbf{v} = (1, 1) \):
\[\cos \theta = \frac{1}{\sqrt{2}} \Rightarrow \theta = 45^\circ\]
Consequences of Cauchy-Schwarz
- Triangle Inequality: \( \|\mathbf{u} + \mathbf{v}\| \leq \|\mathbf{u}\| + \|\mathbf{v}\| \)
- Distance Inequality: \( d(\mathbf{u}, \mathbf{w}) \leq d(\mathbf{u}, \mathbf{v}) + d(\mathbf{v}, \mathbf{w}) \)
Proof of Triangle Inequality:
\[\|\mathbf{u} + \mathbf{v}\|^2 = \|\mathbf{u}\|^2 + 2\langle \mathbf{u}, \mathbf{v} \rangle + \|\mathbf{v}\|^2 \leq \|\mathbf{u}\|^2 + 2\|\mathbf{u}\|\|\mathbf{v}\| + \|\mathbf{v}\|^2 = (\|\mathbf{u}\| + \|\mathbf{v}\|)^2\]
- Orthogonality
Definition
Two vectors \( \mathbf{u} \) and \( \mathbf{v} \) are orthogonal (perpendicular) if:
\[\langle \mathbf{u}, \mathbf{v} \rangle = 0\]
We write \( \mathbf{u} \perp \mathbf{v} \).
Example 1: Standard Basis Vectors
In \( \mathbb{R}^3 \), the standard basis vectors are mutually orthogonal:
- \( \mathbf{e}_1 = (1, 0, 0) \), \( \mathbf{e}_2 = (0, 1, 0) \), \( \mathbf{e}_3 = (0, 0, 1) \)
- \( \mathbf{e}_i \cdot \mathbf{e}_j = 0 \) for \( i \neq j \)
Example 2:
Let \( \mathbf{u} = (1, 1) \), \( \mathbf{v} = (1, -1) \) in \( \mathbb{R}^2 \):
\[\mathbf{u} \cdot \mathbf{v} = (1)(1) + (1)(-1) = 0 \Rightarrow \mathbf{u} \perp \mathbf{v}\]
Orthogonal Complement
For a subspace \( W \subseteq \mathbb{R}^n \), the orthogonal complement \( W^\perp \) is:
\[W^\perp = \{ \mathbf{v} \in \mathbb{R}^n \mid \mathbf{v} \cdot \mathbf{w} = 0 \text{ for all } \mathbf{w} \in W \}\]
Example:
Let \( W = \text{span}\{(1, 1, 0)\} \) in \( \mathbb{R}^3 \). Find \( W^\perp \).
A vector \( (x, y, z) \in W^\perp \) must satisfy:
\[(x, y, z) \cdot (1, 1, 0) = x + y = 0 \Rightarrow y = -x\]
So \( W^\perp = \{ (x, -x, z) \mid x, z \in \mathbb{R} \} = \text{span}\{(1, -1, 0), (0, 0, 1)\} \)
Orthogonal Projection
The orthogonal projection of vector \( \mathbf{u} \) onto vector \( \mathbf{v} \) is:
\[\text{proj}_{\mathbf{v}}(\mathbf{u}) = \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\langle \mathbf{v}, \mathbf{v} \rangle} \mathbf{v}\]
Example:
Project \( \mathbf{u} = (4, 3) \) onto \( \mathbf{v} = (2, 0) \):
\[\text{proj}_{\mathbf{v}}(\mathbf{u}) = \frac{(4, 3) \cdot (2, 0)}{(2, 0) \cdot (2, 0)} (2, 0) = \frac{8}{4}(2, 0) = (4, 0)\]
Orthogonal Sets and Bases
Orthogonal Set: A set of vectors \( \{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k\} \) where \( \mathbf{v}_i \cdot \mathbf{v}_j = 0 \) for \( i \neq j \)
Orthonormal Set: An orthogonal set where all vectors have unit length
Example: Orthogonal Basis for \( \mathbb{R}^2 \)
\[\left\{ \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \begin{bmatrix} 1 \\ -1 \end{bmatrix} \right\}\]
These vectors are orthogonal but not orthonormal. To make them orthonormal:
\[\left\{ \frac{1}{\sqrt{2}} \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \frac{1}{\sqrt{2}} \begin{bmatrix} 1 \\ -1 \end{bmatrix} \right\}\]
Gram-Schmidt Orthogonalization Process
This algorithm converts any basis into an orthogonal basis.
Given basis: \( \{\mathbf{u}_1, \mathbf{u}_2, \ldots, \mathbf{u}_n\} \)
Step 1: \( \mathbf{v}_1 = \mathbf{u}_1 \)
Step 2: \( \mathbf{v}_2 = \mathbf{u}_2 - \frac{\langle \mathbf{u}_2, \mathbf{v}_1 \rangle}{\langle \mathbf{v}_1, \mathbf{v}_1 \rangle} \mathbf{v}_1 \)
Step 3: \( \mathbf{v}_3 = \mathbf{u}_3 - \frac{\langle \mathbf{u}_3, \mathbf{v}_1 \rangle}{\langle \mathbf{v}_1, \mathbf{v}_1 \rangle} \mathbf{v}_1 - \frac{\langle \mathbf{u}_3, \mathbf{v}_2 \rangle}{\langle \mathbf{v}_2, \mathbf{v}_2 \rangle} \mathbf{v}_2 \)
Continue this process...
Example:
Apply Gram-Schmidt to \( \mathbf{u}_1 = (1, 1, 0) \), \( \mathbf{u}_2 = (1, 0, 1) \), \( \mathbf{u}_3 = (0, 1, 1) \)
Step 1: \( \mathbf{v}_1 = (1, 1, 0) \)
Step 2:
\[\mathbf{v}_2 = (1, 0, 1) - \frac{(1, 0, 1) \cdot (1, 1, 0)}{(1, 1, 0) \cdot (1, 1, 0)} (1, 1, 0) = (1, 0, 1) - \frac{1}{2}(1, 1, 0) = \left(\frac{1}{2}, -\frac{1}{2}, 1\right)\]
Step 3:
\[\mathbf{v}_3 = (0, 1, 1) - \frac{(0, 1, 1) \cdot (1, 1, 0)}{2}(1, 1, 0) - \frac{(0, 1, 1) \cdot (\frac{1}{2}, -\frac{1}{2}, 1)}{\|\mathbf{v}_2\|^2} \mathbf{v}_2\]
\[= (0, 1, 1) - \frac{1}{2}(1, 1, 0) - \frac{-\frac{1}{2} + 1}{\frac{3}{2}} \left(\frac{1}{2}, -\frac{1}{2}, 1\right)\]
\[= \left(-\frac{2}{3}, \frac{2}{3}, \frac{2}{3}\right)\]
The orthogonal basis is: \( \left\{ (1, 1, 0), \left(\frac{1}{2}, -\frac{1}{2}, 1\right), \left(-\frac{2}{3}, \frac{2}{3}, \frac{2}{3}\right) \right\} \)
Applications of Orthogonality
- Least Squares Problems: Solving overdetermined systems
- Signal Processing: Fourier analysis, filtering
- Computer Graphics: 3D rendering, perspective projection
- Statistics: Principal Component Analysis (PCA)
- Quantum Mechanics: Orthogonal states represent distinguishable measurements
The concepts of Euclidean spaces, inner products, and orthogonality provide the mathematical foundation for geometry, optimization, and countless applications in science and engineering.
Spectral Theory - Eigenvalues and Eigenvectors
- Definition of Eigenvalues and Eigenvectors
Fundamental Concept
Eigenvalues and eigenvectors reveal the fundamental "behavior" of linear transformations. They identify special directions where the transformation acts like simple scaling.
Formal Definition
Let \( A \) be an \( n \times n \) square matrix. A non-zero vector \( \mathbf{v} \) is called an eigenvector of \( A \) if there exists a scalar \( \lambda \) such that:
\[A\mathbf{v} = \lambda\mathbf{v}\]
The scalar \( \lambda \) is called the eigenvalue corresponding to the eigenvector \( \mathbf{v} \).
Geometric Interpretation
- Eigenvector: A direction that remains unchanged (except for scaling) under the linear transformation
- Eigenvalue: The factor by which the eigenvector is stretched or compressed
- If \( \lambda > 0 \): The vector keeps its direction
- If \( \lambda < 0 \): The vector reverses direction
- If \( \lambda = 0 \): The vector gets collapsed to the zero vector
Example 1: Simple 2ร2 Matrix
Let \( A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} \)
Check if \( \mathbf{v} = \begin{bmatrix} 1 \\ 1 \end{bmatrix} \) is an eigenvector:
\[A\mathbf{v} = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 3 \\ 3 \end{bmatrix} = 3\begin{bmatrix} 1 \\ 1 \end{bmatrix}\]
Yes! \( \mathbf{v} \) is an eigenvector with eigenvalue \( \lambda = 3 \).
Check if \( \mathbf{w} = \begin{bmatrix} 1 \\ -1 \end{bmatrix} \) is an eigenvector:
\[A\mathbf{w} = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \begin{bmatrix} 1 \\ -1 \end{bmatrix} = 1\begin{bmatrix} 1 \\ -1 \end{bmatrix}\]
Yes! \( \mathbf{w} \) is an eigenvector with eigenvalue \( \lambda = 1 \).
Example 2: Shear Transformation
Let \( A = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \) (horizontal shear)
Check if \( \mathbf{v} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \) is an eigenvector:
\[A\mathbf{v} = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} = 1\begin{bmatrix} 1 \\ 0 \end{bmatrix}\]
Yes! Eigenvalue \( \lambda = 1 \).
Check if \( \mathbf{w} = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \) is an eigenvector:
\[A\mathbf{w} = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix} \neq \lambda\begin{bmatrix} 0 \\ 1 \end{bmatrix}\]
No, not an eigenvector.
- Finding Eigenvalues: The Characteristic Polynomial
Derivation
We start with the eigenvalue equation:
\[A\mathbf{v} = \lambda\mathbf{v}\]
Rewriting:
\[A\mathbf{v} - \lambda\mathbf{v} = \mathbf{0}\]
\[(A - \lambda I)\mathbf{v} = \mathbf{0}\]
where \( I \) is the identity matrix.
For non-zero solutions \( \mathbf{v} \) to exist, the matrix \( (A - \lambda I) \) must be singular (non-invertible). Therefore:
\[\det(A - \lambda I) = 0\]
Characteristic Polynomial
The characteristic polynomial of an \( n \times n \) matrix \( A \) is:
\[p(\lambda) = \det(A - \lambda I)\]
The eigenvalues are the roots of this polynomial.
Example 1: 2ร2 Matrix
Let \( A = \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix} \)
Step 1: Form \( A - \lambda I \):
\[A - \lambda I = \begin{bmatrix} 3-\lambda & 1 \\ 1 & 3-\lambda \end{bmatrix}\]
Step 2: Compute determinant:
\[\det(A - \lambda I) = (3-\lambda)(3-\lambda) - (1)(1) = \lambda^2 - 6\lambda + 8\]
Step 3: Solve characteristic equation:
\[\lambda^2 - 6\lambda + 8 = 0 \Rightarrow (\lambda - 2)(\lambda - 4) = 0\]
Eigenvalues: \( \lambda_1 = 2 \), \( \lambda_2 = 4 \)
Example 2: 3ร3 Matrix
Let \( A = \begin{bmatrix} 2 & 0 & 0 \\ 1 & 2 & 1 \\ -1 & 0 & 1 \end{bmatrix} \)
Step 1: Form \( A - \lambda I \):
\[A - \lambda I = \begin{bmatrix} 2-\lambda & 0 & 0 \\ 1 & 2-\lambda & 1 \\ -1 & 0 & 1-\lambda \end{bmatrix}\]
Step 2: Compute determinant (using first row):
\[\det(A - \lambda I) = (2-\lambda)\begin{vmatrix} 2-\lambda & 1 \\ 0 & 1-\lambda \end{vmatrix} - 0 + 0\]
\[= (2-\lambda)[(2-\lambda)(1-\lambda) - 0] = (2-\lambda)^2(1-\lambda)\]
Step 3: Solve characteristic equation:
\[(2-\lambda)^2(1-\lambda) = 0\]
Eigenvalues: \( \lambda_1 = 2 \) (multiplicity 2), \( \lambda_2 = 1 \) (multiplicity 1)
Properties of Characteristic Polynomial
- For an \( n \times n \) matrix, the characteristic polynomial has degree \( n \)
- The sum of eigenvalues = trace of \( A \)
- The product of eigenvalues = determinant of \( A \)
- Complex eigenvalues of real matrices come in conjugate pairs
- Finding Eigenvectors: Eigenspaces
Finding Eigenvectors
For each eigenvalue \( \lambda \), we find eigenvectors by solving:
\[(A - \lambda I)\mathbf{v} = \mathbf{0}\]
This is a homogeneous system of linear equations.
Eigenspace
The eigenspace corresponding to eigenvalue \( \lambda \) is:
\[E_\lambda = \{ \mathbf{v} \in \mathbb{R}^n \mid A\mathbf{v} = \lambda\mathbf{v} \} = \text{Null}(A - \lambda I)\]
- It's a subspace of \( \mathbb{R}^n \)
- Contains all eigenvectors for \( \lambda \) plus the zero vector
- \( \dim(E_\lambda) \) is called the geometric multiplicity of \( \lambda \)
Example 1: Complete Example for 2ร2 Matrix
Let \( A = \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix} \)
We found eigenvalues: \( \lambda_1 = 2 \), \( \lambda_2 = 4 \)
For \( \lambda_1 = 2 \):
\[A - 2I = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}\]
Solve \( \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \)
Equation: \( x + y = 0 \Rightarrow y = -x \)
Eigenvectors: \( \begin{bmatrix} t \\ -t \end{bmatrix} = t\begin{bmatrix} 1 \\ -1 \end{bmatrix} \)
Eigenspace: \( E_2 = \text{span}\left\{ \begin{bmatrix} 1 \\ -1 \end{bmatrix} \right\} \), Geometric multiplicity = 1
For \( \lambda_2 = 4 \):
\[A - 4I = \begin{bmatrix} -1 & 1 \\ 1 & -1 \end{bmatrix}\]
Solve \( \begin{bmatrix} -1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \)
Equation: \( -x + y = 0 \Rightarrow y = x \)
Eigenvectors: \( \begin{bmatrix} t \\ t \end{bmatrix} = t\begin{bmatrix} 1 \\ 1 \end{bmatrix} \)
Eigenspace: \( E_4 = \text{span}\left\{ \begin{bmatrix} 1 \\ 1 \end{bmatrix} \right\} \), Geometric multiplicity = 1
Example 2: Multiple Eigenvalues
Let \( A = \begin{bmatrix} 2 & 0 & 0 \\ 1 & 2 & 1 \\ -1 & 0 & 1 \end{bmatrix} \)
We found eigenvalues: \( \lambda_1 = 2 \) (multiplicity 2), \( \lambda_2 = 1 \) (multiplicity 1)
For \( \lambda_1 = 2 \):
\[A - 2I = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 1 \\ -1 & 0 & -1 \end{bmatrix}\]
Solve system:
- From row 2: \( x + z = 0 \Rightarrow z = -x \)
- From row 3: \( -x - z = 0 \Rightarrow \) same equation
- Row 1 gives no information
Solution: \( \begin{bmatrix} x \\ y \\ -x \end{bmatrix} = x\begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix} + y\begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} \)
Eigenspace: \( E_2 = \text{span}\left\{ \begin{bmatrix} 1 \\ 0 \\ -1 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} \right\} \)
Geometric multiplicity = 2, Algebraic multiplicity = 2
For \( \lambda_2 = 1 \):
\[A - I = \begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 1 \\ -1 & 0 & 0 \end{bmatrix}\]
Solve system:
- From row 1: \( x = 0 \)
- From row 3: \( -x = 0 \Rightarrow x = 0 \) (same)
- From row 2 with \( x = 0 \): \( y + z = 0 \Rightarrow z = -y \)
Solution: \( \begin{bmatrix} 0 \\ y \\ -y \end{bmatrix} = y\begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix} \)
Eigenspace: \( E_1 = \text{span}\left\{ \begin{bmatrix} 0 \\ 1 \\ -1 \end{bmatrix} \right\} \), Geometric multiplicity = 1
Important Concepts
Algebraic vs. Geometric Multiplicity
- Algebraic multiplicity: The multiplicity of \( \lambda \) as a root of the characteristic polynomial
- Geometric multiplicity: The dimension of the eigenspace \( E_\lambda \)
Key Fact: For any eigenvalue:
\[1 \leq \text{geometric multiplicity} \leq \text{algebraic multiplicity}\]
Example Showing Difference
Let \( A = \begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix} \) (Jordan block)
Characteristic polynomial: \( (2-\lambda)^2 \)
Algebraic multiplicity of \( \lambda = 2 \) is 2
Find eigenvectors:
\[A - 2I = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}\]
Only one independent eigenvector: \( \begin{bmatrix} 1 \\ 0 \end{bmatrix} \)
Geometric multiplicity = 1
Diagonalizability
A matrix is diagonalizable if and only if for every eigenvalue:
\[\text{geometric multiplicity} = \text{algebraic multiplicity}\]
This means we can find a basis of eigenvectors.
Applications of Eigenvalues and Eigenvectors
- Differential Equations: Solving systems \( \frac{d\mathbf{x}}{dt} = A\mathbf{x} \)
- Markov Chains: Steady-state distributions
- Principal Component Analysis (PCA): Data compression and feature extraction
- Vibration Analysis: Natural frequencies of mechanical systems
- Quantum Mechanics: Energy levels and stationary states
- Computer Graphics: Orientation, scaling, and principal axes
- Network Analysis: Importance ranking of nodes (PageRank)
Summary
- Eigenvalues tell us about the scaling factors in special directions
- Eigenvectors identify those special directions
- Characteristic polynomial is the tool for finding eigenvalues
- Eigenspaces contain all eigenvectors for a given eigenvalue
- The relationship between algebraic and geometric multiplicity determines diagonalizability
Spectral theory provides deep insights into the structure and behavior of linear transformations, with applications spanning virtually every field of science and engineering.
Singular Value Decomposition (SVD)
- Diagonalizing Matrices and the Spectral Theorem
Matrix Diagonalization Recap
A square matrix \( A \) is diagonalizable if there exists an invertible matrix \( P \) and diagonal matrix \( D \) such that:
\[A = PDP^{-1}\]
The columns of \( P \) are eigenvectors of \( A \), and the diagonal entries of \( D \) are the corresponding eigenvalues.
Limitation: Not all matrices are diagonalizable, especially non-square matrices.
The Spectral Theorem
Theorem (Spectral Theorem for Real Symmetric Matrices)
If \( A \) is a real symmetric matrix (\( A = A^T \)), then:
- All eigenvalues of \( A \) are real
- \( A \) is orthogonally diagonalizable
- There exists an orthogonal matrix \( Q \) (\( Q^{-1} = Q^T \)) and diagonal matrix \( \Lambda \) such that:
\[A = Q\Lambda Q^T\]
Example: Symmetric Matrix Diagonalization
Let \( A = \begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix} \)
Eigenvalues: \( \lambda_1 = 2, \lambda_2 = 4 \)
Orthonormal eigenvectors:
- For \( \lambda_1 = 2 \): \( \mathbf{q}_1 = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 \\ -1 \end{bmatrix} \)
- For \( \lambda_2 = 4 \): \( \mathbf{q}_2 = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 \\ 1 \end{bmatrix} \)
Orthogonal diagonalization:
\[Q = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1 \\ -1 & 1 \end{bmatrix}, \quad \Lambda = \begin{bmatrix} 2 & 0 \\ 0 & 4 \end{bmatrix}\]
\[A = Q\Lambda Q^T = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1 \\ -1 & 1 \end{bmatrix} \begin{bmatrix} 2 & 0 \\ 0 & 4 \end{bmatrix} \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}\]
- Singular Value Decomposition: Definition and Properties
The Fundamental Idea
While the spectral theorem applies only to symmetric matrices, SVD provides a similar decomposition for any matrix (square or rectangular, symmetric or not).
Theorem (Singular Value Decomposition)
For any \( m \times n \) real matrix \( A \) with rank \( r \), there exist:
- An \( m \times m \) orthogonal matrix \( U \)
- An \( n \times n \) orthogonal matrix \( V \)
- An \( m \times n \) "diagonal" matrix \( \Sigma \) with non-negative real entries on the diagonal
Such that:
\[A = U\Sigma V^T\]
The Components
- \( U \) (Left singular vectors): Columns are eigenvectors of \( AA^T \)
- \( \Sigma \) (Singular values): Diagonal entries \( \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0 \)
- \( V \) (Right singular vectors): Columns are eigenvectors of \( A^TA \)
Computing SVD Step by Step
Example: Compute SVD of \( A = \begin{bmatrix} 3 & 0 \\ 0 & -2 \end{bmatrix} \)
Step 1: Compute \( A^TA \) and \( AA^T \)
\[A^TA = \begin{bmatrix} 3 & 0 \\ 0 & -2 \end{bmatrix} \begin{bmatrix} 3 & 0 \\ 0 & -2 \end{bmatrix} = \begin{bmatrix} 9 & 0 \\ 0 & 4 \end{bmatrix}\]
\[AA^T = \begin{bmatrix} 9 & 0 \\ 0 & 4 \end{bmatrix}\]
Step 2: Find eigenvalues and eigenvectors of \( A^TA \)
Eigenvalues: \( \lambda_1 = 9, \lambda_2 = 4 \)
Eigenvectors: \( \mathbf{v}_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \mathbf{v}_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \)
Step 3: Compute singular values
\[\sigma_1 = \sqrt{9} = 3, \quad \sigma_2 = \sqrt{4} = 2\]
\[\Sigma = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix}\]
Step 4: Compute \( U \) vectors
For each \( \mathbf{v}_i \), compute \( \mathbf{u}_i = \frac{1}{\sigma_i}A\mathbf{v}_i \):
\[\mathbf{u}_1 = \frac{1}{3}\begin{bmatrix} 3 & 0 \\ 0 & -2 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \frac{1}{3}\begin{bmatrix} 3 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}\]
\[\mathbf{u}_2 = \frac{1}{2}\begin{bmatrix} 3 & 0 \\ 0 & -2 \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \frac{1}{2}\begin{bmatrix} 0 \\ -2 \end{bmatrix} = \begin{bmatrix} 0 \\ -1 \end{bmatrix}\]
Step 5: Construct SVD
\[U = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}, \quad \Sigma = \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix}, \quad V = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\]
\[A = U\Sigma V^T = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} \begin{bmatrix} 3 & 0 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\]
Properties of SVD
- Singular values are always non-negative and ordered: \( \sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0 \)
- Rank revelation: Number of non-zero singular values = rank of \( A \)
- Matrix norm: \( \|A\|_2 = \sigma_1 \) (the largest singular value)
- Low-rank approximation: SVD provides the best low-rank approximation of any matrix
- The Reduced SVD
For practical applications, we often use the economy SVD or reduced SVD :
If \( A \) is \( m \times n \) with rank \( r \), then:
\[A = U_r \Sigma_r V_r^T\]
where:
- \( U_r \) is \( m \times r \) (first \( r \) columns of \( U \))
- \( \Sigma_r \) is \( r \times r \) diagonal
- \( V_r \) is \( n \times r \) (first \( r \) columns of \( V \))
Example: Rectangular Matrix
Let \( A = \begin{bmatrix} 1 & 1 \\ 0 & 1 \\ 1 & 0 \end{bmatrix} \) (3ร2 matrix, rank 2)
Full SVD: \( U \) is 3ร3, \( \Sigma \) is 3ร2, \( V \) is 2ร2
Reduced SVD: \( U_r \) is 3ร2, \( \Sigma_r \) is 2ร2, \( V_r \) is 2ร2
- Applications of SVD in Data Science
- Low-Rank Approximation (Data Compression)
Theorem (Eckart-Young-Mirsky)
The best rank-\( k \) approximation \( A_k \) to a matrix \( A \) (in Frobenius norm) is given by:
\[A_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^T\]
where we take the first \( k \) singular values and vectors.
Example: Image Compression
Consider a grayscale image as a matrix \( A \) (200ร300 pixels, rank โ 200).
Full storage: 200 ร 300 = 60,000 numbers
Rank-50 approximation:
- Store: 50 singular values + 50 left singular vectors (200ร50) + 50 right singular vectors (300ร50)
- Total: 50 + 10,000 + 15,000 = 25,050 numbers
- Compression ratio: โ 2.4:1
Visual result: Often 90%+ of the "visual information" is preserved with 10-20% of the storage.
- Principal Component Analysis (PCA)
SVD is the computational engine behind PCA:
Given data matrix \( X \) (samples ร features), centered:
- Compute SVD: \( X = U\Sigma V^T \)
- Principal components = columns of \( V \) (right singular vectors)
- Explained variance โ \( \sigma_i^2 \)
Example: Dimensionality Reduction
Dataset with 100 features, but only 5 principal components capture 95% of variance.
Reduced representation: Use first 5 columns of \( U\Sigma \) as new features.
- Recommendation Systems
In collaborative filtering, the user-item rating matrix \( R \) is approximated via SVD:
\[R \approx U_k \Sigma_k V_k^T\]
Interpretation:
- \( U \): User latent factors
- \( \Sigma \): Importance of each latent dimension
- \( V \): Item latent factors
Prediction: Missing ratings estimated from the low-rank approximation.
- Latent Semantic Analysis (Text Mining)
Document-term matrix \( A \):
- Rows: Documents
- Columns: Terms
- Entries: Term frequencies
SVD: \( A = U\Sigma V^T \)
Interpretation:
- \( U \): Document-concept similarities
- \( V \): Term-concept similarities
- \( \Sigma \): Concept strengths
Application: Semantic search, document clustering, query expansion.
- Noise Reduction
Small singular values often correspond to noise.
Strategy: Set small singular values to zero, then reconstruct:
\[A_{\text{clean}} = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^T\]
where \( k \) is chosen to keep "signal" and discard "noise."
- Pseudoinverse and Least Squares
For any matrix \( A \), the Moore-Penrose pseudoinverse is:
\[A^+ = V\Sigma^+ U^T\]
where \( \Sigma^+ \) is obtained by taking reciprocal of non-zero singular values.
Application: Solving \( A\mathbf{x} = \mathbf{b} \) for non-square or singular matrices.
- Computational Example: SVD for Data Analysis
Problem: Analyze customer-product purchase patterns
Let \( A = \begin{bmatrix} 5 & 3 & 0 & 1 \\ 4 & 0 & 0 & 1 \\ 1 & 1 & 0 & 5 \\ 1 & 0 & 0 & 4 \\ 0 & 1 & 5 & 4 \end{bmatrix} \)
(Rows: customers, Columns: products, Entries: purchase counts)
SVD reveals:
- \( \sigma_1 = 9.5 \): Dominant pattern (general purchasing tendency)
- \( \sigma_2 = 6.5 \): Second pattern (contrast between product types)
- \( \sigma_3 = 3.2 \): Third pattern (specific customer segments)
- \( \sigma_4 = 2.1 \): Noise/individual variations
Rank-2 approximation captures 85% of variance:
\[A_2 = \sigma_1\mathbf{u}_1\mathbf{v}_1^T + \sigma_2\mathbf{u}_2\mathbf{v}_2^T\]
Interpretation:
- \( \mathbf{v}_1 \): General product popularity weights
- \( \mathbf{v}_2 \): Distinguishes between product categories
- \( \mathbf{u}_1 \): Overall customer engagement level
- \( \mathbf{u}_2 \): Customer preferences for different product types
Key Insights from SVD
- Dimensionality: Reveals the intrinsic dimensionality of data
- Importance: Singular values quantify the importance of each pattern
- Orthogonality: Different patterns capture independent aspects of the data
- Optimality: SVD provides mathematically optimal low-rank approximations
Implementation Notes
In Python:
import numpy as np
from scipy.linalg import svd
import matplotlib.pyplot as plt
# Example matrix (you can replace this with your own)
A = np.random.rand(6, 4)
# --- Step 1: Compute Singular Value Decomposition (SVD) ---
# A = U ฮฃ Vแต
U, s, Vt = svd(A, full_matrices=False)
# --- Step 2: Choose rank-k approximation ---
k = 2 # desired rank
A_approx = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]
# --- Step 3: Evaluate reconstruction quality ---
fro_error = np.linalg.norm(A - A_approx, 'fro')
relative_error = fro_error / np.linalg.norm(A, 'fro')
print(f"Original matrix shape: {A.shape}")
print(f"Rank-{k} approximation error (Frobenius norm): {fro_error:.6f}")
print(f"Relative error: {relative_error*100:.2f}%")
# --- Step 4: Optional โ visualize singular values decay ---
plt.figure(figsize=(6, 4))
plt.semilogy(s, 'o-', label='Singular values')
plt.axvline(k-1, color='r', linestyle='--', label=f'k = {k}')
plt.title("Singular Value Spectrum")
plt.xlabel("Index")
plt.ylabel("Singular Value (log scale)")
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
# --- Step 5: Optional โ verify reconstruction visually (for small matrices) ---
print("\nOriginal A:\n", np.round(A, 3))
print("\nRank-k Approximation A_approx:\n", np.round(A_approx, 3))
SVD remains one of the most powerful tools in linear algebra, bridging theoretical mathematics with practical applications across data science, engineering, and scientific computing. Its ability to extract fundamental patterns from complex data makes it indispensable in the age of big data.











