The world of matrices is built upon a foundation of familiar arithmetic. We can add, subtract, and multiply matrices, and for a special few, we can even find an inverse. These operations, which mirror the rules of scalar algebra, are powerful and form the bedrock of linear algebra. Yet they invite a deeper and more provocative question. We can easily compute the square of a matrix, $\mathbf{A}^2$, but can we reverse the process and find its square root, $\sqrt{\mathbf{A}}$? Can we go further and define the exponential of a matrix, $e^\mathbf{A}$, or its logarithm? This line of inquiry transitions us from the concrete realm of matrix arithmetic to the more abstract and powerful concept of a general "function of a matrix."
The motivation for this leap is not merely theoretical curiosity. It is driven by the need to describe and analyze the dynamical systems that are central to countless scientific and technical fields. The evolution of these systems over time is often governed by principles that are best expressed not through scalar functions, but through the elegant and compact language of matrix functions. By learning how to apply familiar functions to matrices, we create a powerful "operator calculus." This calculus allows us to analyze and solve complex, multi-dimensional systems with the same conceptual tools we use for single-variable functions, providing a profound and indispensable extension to the theory of linear transformations.
The Natural Starting Point — Polynomials of Matrices
Before we can hope to define a general function of a matrix, such as an exponential or a sine, we must begin with the most straightforward case: a polynomial. Applying a polynomial to a matrix is a natural extension of basic matrix arithmetic. For a scalar polynomial $p(t) = c_k t^k + \dots + c_1 t + c_0$, we can define the corresponding matrix polynomial by simply substituting the matrix $\mathbf{A}$ for the variable $t$. The only subtlety is that the constant term $c_0$ must be replaced by $c_0\mathbf{I}$, where $\mathbf{I}$ is the identity matrix, to ensure that the expression is a sum of matrices. This gives us the formal definition $p(\mathbf{A}) = c_k \mathbf{A}^k + \dots + c_1 \mathbf{A} + c_0\mathbf{I}$.
This simple definition already leads to a remarkable and powerful result that lies at the heart of matrix theory: the Cayley-Hamilton Theorem. This theorem states that every square matrix satisfies its own characteristic equation.
Theorem: The Cayley-Hamilton Theorem
If $p(t) = \det(t\mathbf{I} - \mathbf{A})$ is the characteristic polynomial of an $n \times n$ matrix $\mathbf{A}$, then $p(\mathbf{A}) = \mathbf{0}$, where $\mathbf{0}$ is the $n \times n$ zero matrix.
The Cayley-Hamilton theorem tells us that for any given matrix, there exists at least one polynomial that "annihilates" it. This naturally leads to the question of whether there is a smallest such polynomial. This is answered by the concept of the minimal polynomial.
Definition: The Minimal Polynomial
The minimal polynomial of a square matrix $\mathbf{A}$, denoted $m(t)$, is the unique monic polynomial of the least degree such that $m(\mathbf{A}) = \mathbf{0}$.
The minimal polynomial provides a deeper insight into the matrix's structure by refining the information contained in the characteristic polynomial.
Proposition: Relationship of Minimal and Characteristic Polynomials
For any square matrix $\mathbf{A}$, its minimal polynomial $m(t)$ divides its characteristic polynomial $p(t)$. Both polynomials are invariants under similarity transformations. Consequently, the roots of $m(t)$ are precisely the eigenvalues of $\mathbf{A}$. The multiplicity of an eigenvalue in $p(t)$ is its algebraic multiplicity, while its multiplicity in $m(t)$ corresponds to the size of the largest Jordan block associated with that eigenvalue.
This relationship leads to a profound connection: the minimal and characteristic polynomials are identical if and only if there is exactly one Jordan block for each distinct eigenvalue. This is equivalent to saying that the geometric multiplicity of every eigenvalue is exactly 1, a property of so-called non-derogatory matrices. The minimal polynomial also provides a direct test for invertibility.
Theorem: The Invertibility Criterion
A square matrix $\mathbf{A}$ is invertible if and only if the constant term of its minimal polynomial $m(t)$ is non-zero.
This is because the constant term of the minimal polynomial is, up to a sign, the product of its roots (the eigenvalues), and a matrix is invertible if and only if zero is not an eigenvalue. Most importantly, the structure of the minimal polynomial provides a powerful and elegant test for diagonalizability.
Theorem: The Diagonalizability Criterion
A square matrix $\mathbf{A}$ is diagonalizable if and only if its minimal polynomial $m(t)$ has no repeated roots (i.e., it is a product of distinct linear factors).
This condition is equivalent to stating that the largest Jordan block for every eigenvalue has size 1, which is the very definition of a diagonalizable matrix. Finally, the relationship between a matrix and its polynomials extends to its spectrum in a very direct and elegant way.
Theorem: Eigenvalues of a Matrix Polynomial
If $\lambda$ is an eigenvalue of a matrix $\mathbf{A}$ with corresponding eigenvector $\mathbf{x}$, then $p(\lambda)$ is an eigenvalue of the matrix $p(\mathbf{A})$ with the same eigenvector $\mathbf{x}$.
This result is fundamental, as it shows that the action of a matrix polynomial is perfectly simple and predictable when viewed from the perspective of an eigenvector. It forms the basis of the "functional calculus," where applying a function to a matrix can be understood as simply applying the function to its eigenvalues. This insight is the first major step toward a general theory of matrix functions.
Defining General Functions — Three Perspectives
With the foundation of matrix polynomials in place, we can now tackle the central challenge: how to define $f(\mathbf{A})$ for a more general function $f$, such as an exponential, logarithm, or trigonometric function. There are three primary perspectives for constructing such a definition, each with its own advantages and level of generality. It is a crucial point that all three of these methods are designed to be consistent with one another; for any matrix where multiple definitions apply, they will yield the same unique result for $f(\mathbf{A})$.
The first and most intuitive approach is to use a Taylor series. For any function $f(t)$ that is analytic—meaning it can be represented by a convergent power series, $f(t) = \sum_{k=0}^\infty c_k t^k$—we can naturally define the matrix function by substituting the matrix $\mathbf{A}$ into the series: $f(\mathbf{A}) = \sum_{k=0}^\infty c_k \mathbf{A}^k$. The most famous example of this is the matrix exponential, defined as $e^\mathbf{A} = \mathbf{I} + \mathbf{A} + \frac{1}{2!}\mathbf{A}^2 + \dots$. This definition is powerful because it builds directly from the familiar concept of a power series, but its direct use is limited to analytic functions where the resulting matrix series can be shown to converge.
The second approach provides a more algebraic and geometrically insightful definition, but it is restricted to the important class of diagonalizable matrices. If a matrix $\mathbf{A}$ can be written as $\mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^{-1}$, where $\mathbf{D}$ is a diagonal matrix of eigenvalues, we can define the matrix function through this decomposition. This is the essence of the functional calculus.
Definition: Function of a Diagonalizable Matrix
If $\mathbf{A}$ is a diagonalizable matrix with $\mathbf{A} = \mathbf{P}\mathbf{D}\mathbf{P}^{-1}$, where $\mathbf{D} = \text{diag}(\lambda_1, \dots, \lambda_n)$, and $f$ is a scalar function defined on the eigenvalues of $\mathbf{A}$, then:
This definition is motivated by the desire for consistency with the simplest case: polynomials. Consider the square of the matrix, $\mathbf{A}^2 = (\mathbf{P}\mathbf{D}\mathbf{P}^{-1})(\mathbf{P}\mathbf{D}\mathbf{P}^{-1}) = \mathbf{P}\mathbf{D}(\mathbf{P}^{-1}\mathbf{P})\mathbf{D}\mathbf{P}^{-1} = \mathbf{P}\mathbf{D}^2\mathbf{P}^{-1}$. By extension, any power $\mathbf{A}^k$ is equal to $\mathbf{P}\mathbf{D}^k\mathbf{P}^{-1}$. When we apply a polynomial $p(t)$ to $\mathbf{A}$, we get a sum of such terms, which simplifies to $p(\mathbf{A}) = \mathbf{P}p(\mathbf{D})\mathbf{P}^{-1}$. Since $\mathbf{D}$ is diagonal, $p(\mathbf{D})$ is simply the diagonal matrix where the polynomial has been applied to each eigenvalue. This shows that for the simple case of polynomials, the function naturally "passes through" the change-of-basis matrices to act directly on the eigenvalues. By elevating this required property to a definition for any general function $f$, we ensure that all matrix functions behave in a way that is consistent with basic matrix algebra. This method is elegant because it reduces the problem of applying a function to a matrix to the much simpler problem of applying the function to its individual eigenvalues, revealing that, in the eigenbasis, the action of $f(\mathbf{A})$ is simply to apply the scalar function $f$ to each of the fundamental modes of the transformation.
The third and most powerful approach provides a definition that works for any square matrix, including those that are not diagonalizable. This general definition is achieved using the Jordan Canonical Form. Any square matrix $\mathbf{A}$ can be expressed as $\mathbf{A} = \mathbf{P}\mathbf{J}\mathbf{P}^{-1}$, where $\mathbf{J}$ is a block diagonal matrix composed of Jordan blocks. We can then define $f(\mathbf{A}) = \mathbf{P}f(\mathbf{J})\mathbf{P}^{-1}$. The challenge is reduced to defining a function of a single Jordan block, $\mathbf{J}_k$. This is accomplished by a formula that involves not only the function $f$ itself, but also its derivatives.
Definition: Function of a Jordan Block
For a Jordan block $\mathbf{J}_k$ of size $m$ with eigenvalue $\lambda$, and a function $f$ that is sufficiently differentiable at $\lambda$, the matrix $f(\mathbf{J}_k)$ is defined as the upper triangular matrix:
While this formula appears complex, it is not arbitrary. It is precisely the definition required to be consistent with the Taylor series approach. By writing the Jordan block as the sum of a diagonal and a nilpotent part, $\mathbf{J}_k = \lambda\mathbf{I} + \mathbf{N}$, and formally applying the Taylor series of $f$ around $\lambda$, the nilpotent property of $\mathbf{N}$ causes the infinite series to terminate exactly into the finite structure shown above. This general definition is profound. It reveals that for the simple, diagonalizable parts of a transformation (the diagonal of the Jordan block), the function acts directly on the eigenvalues. For the more complex, shearing parts of a transformation (the superdiagonal of the Jordan block), the function's action is governed by its local derivatives. This provides a complete and consistent way to define any sufficiently differentiable function for any square matrix.
These carefully constructed definitions lead to a set of powerful and consistent algebraic rules for manipulating matrix functions.
Theorem: Properties of General Matrix Functions
Let $f$ be a function defined on the spectrum of a matrix $\mathbf{A}$. The matrix function $f(\mathbf{A})$ has the following properties:
- Commutativity with A: $f(\mathbf{A})$ commutes with $\mathbf{A}$, i.e., $\mathbf{A}f(\mathbf{A}) = f(\mathbf{A})\mathbf{A}$.
- Similarity Invariance: For any invertible matrix $\mathbf{P}$, $f(\mathbf{P}\mathbf{A}\mathbf{P}^{-1}) = \mathbf{P}f(\mathbf{A})\mathbf{P}^{-1}$.
- Spectral Mapping: The eigenvalues of $f(\mathbf{A})$ are precisely $f(\lambda_1), f(\lambda_2), \dots, f(\lambda_n)$, where $\lambda_1, \dots, \lambda_n$ are the eigenvalues of $\mathbf{A}$.
- Commutativity Preservation: If a matrix $\mathbf{B}$ commutes with $\mathbf{A}$, then $\mathbf{B}$ also commutes with $f(\mathbf{A})$.
This theorem provides the "rules of the game" for the operator calculus. The Spectral Mapping Theorem, in particular, is a cornerstone result, generalizing our earlier finding for polynomials. It confirms that no matter how complex the matrix $\mathbf{A}$ is, the spectrum of $f(\mathbf{A})$ can be found by simply applying the scalar function to the spectrum of $\mathbf{A}$. A final useful property relates to block matrices.
Proposition: Functions of Block Diagonal Matrices
If $\mathbf{A}$ is a block diagonal matrix, $\mathbf{A} = \text{diag}(\mathbf{A}_1, \mathbf{A}_2, \dots, \mathbf{A}_k)$, then $f(\mathbf{A})$ is also block diagonal, and is given by:
The final unifying result of this theory is perhaps the most surprising. It reveals that any matrix function, no matter how complex, is ultimately equivalent to a simple polynomial of the original matrix.
Theorem: The Polynomial Representation of a Matrix Function
For any square matrix $\mathbf{A}$ and any function $f$ that is sufficiently differentiable on its spectrum, there exists a *unique* polynomial $p(t)$ of degree less than the degree of the minimal polynomial of $\mathbf{A}$, such that $f(\mathbf{A}) = p(\mathbf{A})$.
This theorem provides the ultimate statement of consistency. It guarantees that the definitions via Taylor series, diagonalization, and the Jordan form all produce the same unique result because they must all agree on polynomials. This is a profound insight: even a seemingly transcendental function like $e^\mathbf{A}$ can be perfectly represented by a finite-degree polynomial in $\mathbf{A}$, a fact that is central to both the theory and practical computation of matrix functions.
Important Matrix Functions and Their Properties
Having established a general theory, we now turn our attention to the specific functions that are cornerstones of applied mathematics. Among these, the matrix exponential stands out as the undisputed star player, but others, such as the logarithm, trigonometric functions, and the square root, also possess unique and important properties.
The matrix exponential, $e^\mathbf{A}$, is the gateway to understanding continuous dynamical systems. Its most vital application is in solving systems of linear ordinary differential equations. The solution to the initial value problem $\frac{d\mathbf{x}}{dt} = \mathbf{A}\mathbf{x}$ with initial condition $\mathbf{x}(0) = \mathbf{x}_0$ is given elegantly by $\mathbf{x}(t) = e^{\mathbf{A}t}\mathbf{x}_0$. The matrix exponential acts as a "propagator," evolving the state of the system forward in time. This function possesses several key properties that are analogous to, but subtly different from, its scalar counterpart.
Proposition: Properties of the Matrix Exponential
For any square matrices $\mathbf{A}$ and $\mathbf{B}$:
- Invertibility: $e^\mathbf{A}$ is always invertible, and its inverse is $(e^\mathbf{A})^{-1} = e^{-\mathbf{A}}$.
- Determinant: The determinant of the exponential is the exponential of the trace: $\det(e^\mathbf{A}) = e^{\text{tr}(\mathbf{A})}$. This is known as Jacobi's formula.
- Sum of Matrices: The familiar rule $e^{\mathbf{A}+\mathbf{B}} = e^\mathbf{A}e^\mathbf{B}$ holds *if and only if* the matrices commute, i.e., $\mathbf{AB} = \mathbf{BA}$.
The non-commutative nature of the sum rule is a critical point; for general matrices, the order of operations matters profoundly. The theory also extends naturally to trigonometric functions, which can be defined via their Taylor series. This leads to a beautiful matrix generalization of one of mathematics' most famous identities.
Theorem: Euler's Formula for Matrices
For any square matrix $\mathbf{A}$, $e^{i\mathbf{A}} = \cos(\mathbf{A}) + i\sin(\mathbf{A})$.
This formula reveals the deep connection between the exponential function, which describes general system evolution, and trigonometric functions, which describe oscillations.
The matrix logarithm, denoted $\log(\mathbf{A})$, is the inverse of the exponential function. A matrix $\mathbf{X}$ is a logarithm of $\mathbf{A}$ if $e^\mathbf{X} = \mathbf{A}$. Unlike the scalar case, the matrix logarithm is not unique and may not exist for all matrices. A matrix has a real logarithm if and only if it is invertible and every Jordan block corresponding to a negative eigenvalue occurs an even number of times. This complexity arises because multiple matrices can be mapped to the same exponential, a direct consequence of the periodic nature of the scalar exponential function in the complex plane.
The matrix square root also presents unique challenges. A matrix $\mathbf{X}$ is a square root of $\mathbf{A}$ if $\mathbf{X}^2 = \mathbf{A}$. A matrix can have no square roots, a finite number, or infinitely many. However, a special and important case exists for positive definite matrices.
Theorem: The Principal Square Root
Every positive definite matrix $\mathbf{A}$ has a *unique* positive definite square root, denoted $\mathbf{A}^{1/2}$.
This unique "principal" square root has important applications, such as the generation of correlated random variables in statistics. Together, these functions form the core of the operator calculus, providing a powerful toolkit for analyzing a vast range of complex systems.
Conclusion
The journey from simple matrix arithmetic to a general theory of matrix functions represents a significant leap in analytical power. We began by recognizing that polynomials of a matrix are well-defined and that their behavior is deeply tied to the matrix's eigenvalues and minimal polynomial. From this foundation, we constructed a consistent and powerful framework for applying general functions to any square matrix, whether through the lens of a Taylor series, the eigenbasis of a diagonalizable matrix, or the more general Jordan form. The result is a rich "operator calculus" where functions like the exponential, logarithm, and square root can be applied to matrices, inheriting familiar properties while also revealing new, subtle behaviors dictated by the non-commutative nature of matrix algebra. This calculus is not merely a theoretical abstraction; it is the essential language for describing and solving the continuous and discrete dynamical systems that model the world around us.