ab, ba, and the spectrum

June 5, 2012 by Qiaochu Yuan

Let $a, b$ be two $n \times n$ matrices. If $a, b$ don’t commute, then $ab \neq ba$ ; however, the two share several properties. If either $a$ or $b$ is invertible, then $ab$ is conjugate to $ba$ , so in particular they have the same characteristic polynomial.

What if neither $a$ nor $b$ are invertible? As it turns out, $ab$ and $ba$ still have the same characteristic polynomial, although they are not conjugate in general (e.g. we might have $ab = 0$ but $ba$ nonzero). There are several ways of proving this result, which implies in particular that $ab$ and $ba$ have the same eigenvalues.

What if $a, b$ are linear transformations on an infinite-dimensional vector space? Do $ab$ and $ba$ still have the same eigenvalues in an appropriate sense? As it turns out, the answer is yes, and the key lemma in the proof is an interesting piece of “noncommutative high school algebra.”

Square matrices

Proposition: Let $a, b$ be two $n \times n$ matrices over an algebraically closed field $k$ . Then $ab$ and $ba$ have the same characteristic polynomial.

(Note that this is a polynomial identity in the entries of $a, b$ regarded as $2n^2$ formal variables, so to prove it identically (in particular for all commutative rings $R$ ) it suffices to prove it over a fixed algebraically closed field, e.g. we could take $k = \mathbb{C}$ .)

Proof 1. This is clear if $b$ is invertible since $ba = b(ab)b^{-1}$ . The invertible matrices are Zariski-dense in all matrices (when $k = \mathbb{C}$ they are also dense in the usual topology), so the result follows in general.

Proof 2. Recall that $\text{tr}(ab) = \text{tr}(ba)$ . (This is straightforward to prove by computation but it also has an elegant proof using the invariant description of the trace as tensor contraction $\text{End}(V) \cong V \otimes V^{\ast} \to k$ , and one can also prove it in the same way as above.) By induction we conclude that $\text{tr}((ab)^n) = \text{tr}((ba)^n)$ for all $n$ , so the power sum symmetric polynomials in the eigenvalues of $ab$ and $ba$ are identical. By the Newton-Girard identities, it follows that the elementary symmetric polynomials in the eigenvalues of $ab$ and $ba$ are identical.

(Edit, 10/25/17:) Proof 2 only works over a field of characteristic zero. Fixing it basically gives a version of Proof 3.

Proof 3. We work universally. Let the entries of $a, b$ be formal variables $a_{ij}, b_{ij}$ in a polynomial ring $\mathbb{Z}[a_{ij}, b_{ij}]$ . Note that we have

$\det(\lambda b - bab) = \det(b) \det(\lambda I - ab) = \det(\lambda I - ba) \det(b)$

where $\lambda$ is another (scalar) variable and $I$ is the identity matrix. Since $\mathbb{Z}[a_{ij}, b_{ij}]$ is an integral domain, we can cancel $\det(b)$ from both sides to obtain

$\displaystyle \det(\lambda I - ab) = \det(\lambda I - ba)$ .

Eigencomplications

What can we say beyond $n \times n$ matrices? It might be reasonable to guess that $ab$ and $ba$ still have the same eigenvalues if $a, b$ are linear transformations on an infinite-dimensional vector space; however, since no analogue of the characteristic polynomial, the trace, or the determinant is available in general in this setting, the proofs above don’t generalize, and in fact the result is false.

Example. Consider the differential operators $x \frac{d}{dx}$ and $\frac{d}{dx} x = x \frac{d}{dx} + 1$ acting on $k[x]$ ( $k$ a field of characteristic zero). The former has eigenvectors $x^n$ with eigenvalues $n$ while the latter has eigenvectors $x^n$ with eigenvalues $n+1$ ; thus the eigenvalues of the former are $\{ 0, 1, 2, ... \}$ while the eigenvalues of the latter are $\{ 1, 2, ... \}$ .

$x$ and $\frac{d}{dx}$ provide examples of other infinite-dimensional phenomena too: they are endomorphisms of a vector space like $n \times n$ matrices, but one is injective without being surjective, one is surjective without being injective, and $x$ doesn’t have any eigenvectors whatsoever (even if $k$ is algebraically closed)!

The issue that occurs in the above example is the following. Let $a, b$ be two linear operators on a vector space $V$ . Suppose that $v$ is an eigenvector for $ab$ , thus

$ab v = \lambda v$

for some $\lambda$ . Then

$bab v = ba(bv) = b \lambda v = \lambda (bv)$

so that $bv$ is an eigenvector for $ba$ with the same eigenvalue, unless it is equal to zero. But in that case $abv = 0$ , so $\lambda = 0$ . If $\lambda \neq 0$ , then this cannot occur. So we have proven the following.

Proposition: Let $a, b$ be two linear transformations on a vector space $V$ . Then $\lambda$ is a nonzero eigenvalue of $ab$ if and only if it is a nonzero eigenvalue of $ba$ .

Thus the only possible discrepancy in the sets of eigenvalues occurs when $\lambda = 0$ , and the example of $a = x, b = \frac{d}{dx}$ shows that this can occur.

Unfortunately, as we have seen, if $V$ is infinite-dimensional there are linear transformations with no eigenvectors whatsoever, such as $x$ acting on $k[x]$ . Is there no hope of generalizing the above result to such linear transformations?

The spectrum

When we talk about eigenvalues, what are we really talking about? On a finite-dimensional vector space $V$ , to say that a linear transformation $T : V \to V$ has an eigenvector $v$ with eigenvalue $\lambda$ is to say that $Tv = \lambda v$ , or $(T - \lambda) v = 0$ . This is true if and only if $T - \lambda$ fails to be invertible, which suggests the following definition.

Definition: Let $T : V \to V$ be a linear transformation on a vector space over a field $k$ . The spectrum $\sigma(T)$ consists of all $\lambda \in k$ such that $\lambda - T$ is not invertible.

If $\lambda - T$ is not invertible because it is not injective, then $\lambda$ is an eigenvalue; in functional analysis terms, $\lambda$ lies in the point spectrum $\sigma_p(T)$ . However, $\lambda - T$ may also fail to be invertible because it is not surjective, and in this case the spectrum is strictly larger than the point spectrum.

Example. The linear operator $x$ acting on $k[x]$ has spectrum all of $k$ even though it has no point spectrum.

Example. The linear operator $x$ acting on $k(x)$ has empty spectrum.

Example. Let $X$ be a topological space, $V = C(X)$ be the space of continuous functions $X \to \mathbb{C}$ , and let $T_f : V \to V$ be multiplication by some function $f \in V$ . Then the spectrum of $T_f$ is precisely the range of $f$ . (This is one basic reason it’s reasonable to think of eigenvalues of operators in quantum mechanics as being values of functions such as position and momentum.) In many cases, $T_f$ has no point spectrum (e.g. $X = [0, 1], f = x$ ).

There is a nice connection between spectra in this sense and spectra of rings in algebraic geometry. A simple version is as follows. If $T : V \to V$ is a linear transformation of a finite-dimensional vector space $V$ over an algebraically closed field $k$ , then we can consider the ring $k[T]$ generated by $T$ in $\text{End}(V)$ . This ring is isomorphic to $k[x]/m(x)$ where $m$ is the minimal polynomial of $T$ , so the spectrum of $k[T]$ can naturally be identified with the set of eigenvalues (that is, the spectrum) of $T$ !

A more general connection can be obtained by generalizing the definition of spectrum.

Definition: Let $k$ be a field and $R$ a $k$ -algebra. Let $a \in R$ . The spectrum $\sigma(a)$ of $a$ consists of all $\lambda \in k$ such that $\lambda - a$ is not invertible in $R$ .

(We recover the definition applied to linear operators by taking $R = \text{End}(V)$ .)

Now observe that if $R$ is commutative, then $\lambda - a$ is not invertible if and only if it is contained in a maximal ideal $m$ . This maximal ideal has the property that $a = \lambda$ in $R/m$ , thus thinking of $a$ as a function on $\text{MaxSpec } R$ it follows that $a$ takes on every value in its spectrum at an appropriate maximal ideal! If the residue fields of $R$ are all $k$ (which occurs for example if $k$ is algebraically closed and $R$ is finitely-generated by the Nullstellensatz) then the converse is also true.

We are now ready to state the appropriate generalization of the proposition above about characteristic polynomials.

Proposition: Let $a, b$ be two elements of a $k$ -algebra $R$ . Then $\sigma(ab) \setminus \{ 0 \} = \sigma(ba) \setminus \{ 0 \}$ .

The crucial lemma

The proposition claims that if $\lambda$ is nonzero, then $\lambda - ab$ is invertible if and only if $\lambda - ba$ is invertible. Equivalently, $1 - \frac{ab}{\lambda}$ is invertible if and only if $1 - \frac{ba}{\lambda}$ is invertible. By setting $a' = \frac{a}{\lambda}$ , we see that the following piece of “noncommutative high school algebra” implies and in fact generalizes our proposition.

Lemma: Let $a, b$ be elements of a ring $R$ . Then $1 - ab$ is invertible if and only if $1 - ba$ is invertible.

This lemma is somewhat infamous for having the following “proof.” Pretend that it makes sense to write

$\displaystyle (1 - ab)^{-1} = 1 + ab + abab + ababab + ...$

in a general ring. Then

$\displaystyle b (1 - ab)^{-1} a = ba + baba + bababa + ... = (1 - ba)^{-1} - 1$ .

And indeed, if we write $c = (1 - ab)^{-1}$ , then we dutifully find that

$\displaystyle (1 - ba)(bca + 1) = bca - babca + 1 - ba = 1 + b(1 - ab)ca - ba = 1$

and similarly

$\displaystyle (bca + 1)(1 - ba) = bca + 1 - bcaba - ba = 1 + bc(1 - ab)a - ba = 1$ .

Halmos once posed the problem of explaining in what sense the geometric series proof works. There is a discussion of this problem on MO which I would summarize as follows. The universal ring describing this problem is the free ring

$\displaystyle \mathbb{Z} \langle a, b, c \rangle / (c(1 - ab) - 1, (1 - ab)c - 1)$

on two elements $a, b$ , and an inverse to $1 - ab$ . If we show that $1 - ba$ has an inverse in this ring, then we are done by the universal property. And the idea is that this ring ought to embed in a suitable ring of formal power series where one can make sense of geometric series expansions. To make this easier, we’ll work with a different ring

$\displaystyle R = \mathbb{Z} \langle a, b, c \rangle [t] / (c(1 - tab) - 1, (1 - tab)c - 1)$

where $t$ is a new central variable, and we want to embed $R$ into $\mathbb{Z} \langle a, b \rangle [[t]]$ by sending $c$ to $1 + tab + t^2 abab + ...$ . In this ring the geometric series argument makes perfect sense and proves that $tbca + 1$ is an inverse to $1 - tba$ . Applying the evaluation homomorphism out of $R$ obtained by setting $t = 1$ then solves our original problem.

However, it does not seem trivial to me to prove that the natural map from $R$ to $\mathbb{Z} \langle a, b \rangle [[t]]$ is actually injective, and nobody in the MO discussion above seems to actually prove this.

Posted in math.FA, math.RA | Tagged universal properties | 8 Comments

8 Responses

on December 9, 2015 at 12:57 pm | Reply Martin Argerami

Hi Qiaochu, I don’t know if you have any plans to update this. But the rectangular matrix case can be dealt with very easily after having done the square case: see http://math.stackexchange.com/a/332688/22857
on May 29, 2014 at 6:22 am | Reply LinAlgMan

In proof 3 for square matrices, you wrote an argument of the form $\det(B) \det(X) = \det(Z) = \det(Y) \det(B)$ and since we work over an integral domain we can cancel $\det(B)$ from both side. Put it in another way $$ \det(B) \cdot ( \det(X)-\det(Y) ) = 0 \ .$$

While the image of the det homomorphism is an integral domain, its domain is not! You can have a non-zero non-invertible matrix $B$ and then $\det(B)=0$ and you can’t say anything about $\det(X)-\det(Y)$. So the proof is valid only when both $A$ and $B$ are invertible (or at least when $A$ and $B$ have a non-zero determinant).
- on May 29, 2014 at 10:06 am | Reply Qiaochu Yuan
  
  In proof 3 the entries of the matrices are all algebraically independent formal variables; in particular, the determinant is literally the determinant as a polynomial in the entries, and so is definitely nonzero. That’s the benefit of working universally. (That is, in that proof the matrices aren’t particular matrices; they’re literally the universal pair of matrices.)
on February 23, 2013 at 4:35 am | Reply roll off dumpster

Very nice post. I just stumbled upon your blog and wanted to mention that I’ve really loved surfing around your weblog posts. After all I will be subscribing to your feed and I hope you write once more very soon!
on June 19, 2012 at 10:40 am | Reply Dinesh valluri

In proof-3 why is $det(\lambda I – bab) = det(b) det(\lambda I – ab)$?
- on June 23, 2012 at 1:04 pm | Reply Qiaochu Yuan
  
  It’s not, of course! I meant $\det (\lambda b - bab)$ .
on June 16, 2012 at 3:46 am | Reply Amos

in proof 3 you missed b in the first determinant:det(b\lambdaI-bab)=det(b)(det(\lambdaI-ab))=det(b)det(\lambdaI-ba). And K[T] is not isomorphic to K[x]/m(x) where m is the charateristic polynomial but to K[x]/p(x) where p is the minimal polynomial(otherwise for example K[Id] would be n dimensional,rather than 1 dimensional)
- on June 23, 2012 at 1:02 pm | Reply Qiaochu Yuan
  
  Thanks! Those were both typos; I meant the minimal polynomial (that’s why I called it $m$ !).

Comments RSS

	Pete on Ideals and the category of com…
	Anonymous on Fixed points of random pe…
	Felix Pahl on Introduction to string di…
	Amandeep Amandeep on Affine varieties and regular…
	ZW on The double commutant theo…

Annoying Precision

"A good stock of examples, as large as possible, is indispensable for a thorough understanding of any concept, and when I want to learn something new, I make it my first job to build one." – Paul Halmos